Data curation
Overview
Preprocessing
- Each dataset that is analyzed, or re-analyzed in our machines goes through a standardized pipeline, involving
- Log2 transforming raw counts
- Changing column/variable/attribute names for consistency
- Noting the amount of missing data, and/or calculating <LLOD measurements
- Using same type of techniques when raw data is available, e.g., limma
- Abstracting attributes of paper where biomarker was published.
- Any dataset obtained as Supplementary Materials from published materials is used as-is. If you notice any differences between published vs uploaded, this typically implies that raw data were re-analyzed with our standardized pipeline. Please reach out to us (utkarshdang@cunet.carleton.ca) if you have any questions.
Tools
- R (R Core Team 2022)
- Shiny (Chang et al. 2023)
- bslib (Sievert, Cheng, and Aden-Buie 2025)
- ggplot2 (Wickham 2016)
- WGCNA (Langfelder and Horvath 2008)
- limma (Ritchie et al. 2015)
- corrplot (Wei and Simko 2021)
- bioicons:
- muscle-1 icon by Servier https://smart.servier.com/ is licensed under CC-BY 3.0 Unported https://creativecommons.org/licenses/by/3.0/,
- blood_sample icon by Marcel Tisch https://twitter.com/MarcelTisch is licensed under CC0 https://creativecommons.org/publicdomain/zero/1.0/
References
Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2023. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Langfelder, Peter, and Steve Horvath. 2008. “WGCNA: An r Package for Weighted Correlation Network Analysis.” BMC Bioinformatics, no. 1: 559. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ritchie, Matthew E, Belinda Phipson, Di Wu, Yifang Hu, Charity W Law, Wei Shi, and Gordon K Smyth. 2015. “limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47. https://doi.org/10.1093/nar/gkv007.
Sievert, Carson, Joe Cheng, and Garrick Aden-Buie. 2025. Bslib: Custom ’Bootstrap’ ’Sass’ Themes for ’Shiny’ and ’Rmarkdown’. https://rstudio.github.io/bslib/.
Wei, Taiyun, and Viliam Simko. 2021. R Package ’Corrplot’: Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.