MDBiomarkers (v 1.01)
  • Site map
  • Home
  • Data curation
  • Visualization
    • Interactive Biomarker App
  • Tutorial
  • FAQ
  • Funding/Acknowledgements

Data curation

Overview

Preprocessing

  • Each dataset that is analyzed, or re-analyzed in our machines goes through a standardized pipeline, involving
    • Log2 transforming raw counts
    • Changing column/variable/attribute names for consistency
    • Noting the amount of missing data, and/or calculating <LLOD measurements
    • Using same type of techniques when raw data is available, e.g., limma
    • Abstracting attributes of paper where biomarker was published.
  • Any dataset obtained as Supplementary Materials from published materials is used as-is. If you notice any differences between published vs uploaded, this typically implies that raw data were re-analyzed with our standardized pipeline. Please reach out to us (utkarshdang@cunet.carleton.ca) if you have any questions.

Tools

  • R (R Core Team 2022)
  • Shiny (Chang et al. 2023)
  • bslib (Sievert, Cheng, and Aden-Buie 2025)
  • ggplot2 (Wickham 2016)
  • WGCNA (Langfelder and Horvath 2008)
  • limma (Ritchie et al. 2015)
  • corrplot (Wei and Simko 2021)
  • bioicons:
    • muscle-1 icon by Servier https://smart.servier.com/ is licensed under CC-BY 3.0 Unported https://creativecommons.org/licenses/by/3.0/,
    • blood_sample icon by Marcel Tisch https://twitter.com/MarcelTisch is licensed under CC0 https://creativecommons.org/publicdomain/zero/1.0/

References

Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2023. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Langfelder, Peter, and Steve Horvath. 2008. “WGCNA: An r Package for Weighted Correlation Network Analysis.” BMC Bioinformatics, no. 1: 559. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ritchie, Matthew E, Belinda Phipson, Di Wu, Yifang Hu, Charity W Law, Wei Shi, and Gordon K Smyth. 2015. “limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47. https://doi.org/10.1093/nar/gkv007.
Sievert, Carson, Joe Cheng, and Garrick Aden-Buie. 2025. Bslib: Custom ’Bootstrap’ ’Sass’ Themes for ’Shiny’ and ’Rmarkdown’. https://rstudio.github.io/bslib/.
Wei, Taiyun, and Viliam Simko. 2021. R Package ’Corrplot’: Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.