Frequently asked questions

Frequently asked questions on using the interactive MD biomarker database tool

Keywords

Citation, meta-analysis, protein, mRNA, heterodimers, associations

How to cite?

Important

A lot of effort has been invested in building this resource; please cite it when used; continued citations will help motivate continued development and maintenance of this resource:

Tu, W, Tobin, RA, Abdelrazeq, L, Guite, K, Szigyarto, CAK, Tsonaka, R, Degan, C, van der Burgt, YEM, Díaz-Manera, J, Guglieri, M, Spitali, P, Hathout, Y, Dang, UJ. A Queryable Molecular Database with mRNA and Protein Markers from Multiple Serum and Tissue Datasets for Duchenne Muscular Dystrophy. Submitted.

Each study used is also cited on the Data Curation page and can be obtained from tool tips or headings in the biomarker app, so users of this resource can cite the specific studies, as appropriate for their context.

My question is not answered in the below. Who do I contact?

Contact Utkarsh Dang at utkarshdang@cunet.carleton.ca.

How to use this MD biomarker application?

See the Tutorial webpage for a detailed walk-through. Depending on the data available for a given biomarker, this is aggregated into different subpanels. Often, in muscular dystrophy research, published studies may not have had enough power for an analysis of interest for each biomarker quantified in the research effort, especially after multiple testing correction (i.e., inflating p-values to control for Type 1 errors or false discoveries). With aggregated data from different sources (cohorts, experiments, investigators, technologies), the researcher/user can look at a) consistency of fold change across studies, and b) directionality of effect size along with raw and adjusted p-values from original work or reanalysis of original data.

Can I download a report for a biomarker?

On any biomarker-specific view, near the top right of the Shiny app, Click on “Download Report” and then “Biomarker-specific” to download an HTML report of the information available through the Shiny app. In addition, most subpanels can be easily exported as a figure file or a spreadsheet. On Chrome browser, in any biomarker-specific view, right clicking, clicking print, and selecting save as pdf in landscape mode can also be utilized. However, we recommend using the Download Report functionality.

Where are the biomarker datasets obtained from?

Many of the raw datasets were shared by researchers whose labs generated these data. Some raw datasets were obtained from GEO Datasets. Others datasets are aggregate findings from supplemental files of published materials. See the Data Curation webpage for a list of publications and datasets.

How can I submit my own biomarker data?

If there is a dataset that you would like to contribute to this resource, please reach out to Utkarsh Dang at utkarshdang@cunet.carleton.ca.

Why are the figures looking stretched?

The app is responsive, as is the hosting website. Try stretching your browser window as you wish, and then refreshing the webpage.

Can we summarize the estimates in a specific subpanel via a mean, median, etc?

We suggest if trying to get to an overall proxy estimate, to only do so for the same target and technology as the same Uniprot ID can be connected to different targets across platforms. We also suggest to focus on a specific age range of interest, and the target of interest, to get the rows of interest (by exporting the table from a subpanel) before summarizing. A note of caution regarding technology: it doesn’t make sense to combine estimates from TMT mass spectrometry with Somalogic Somascan estimates for an overall proxy estimate. Also be aware that depending on the platform, multiple targets might be available for the same Uniprot ID, or multiple quantifications might be available for the same target, e.g., for MDC aka CCL22 (UniProt: O00626), there are two aptamers from the Somalogic platform in their 7K platform. This also applies to Affymetrix data.

Are aggregate data findings in a subpanel comparable?

See answer above. Moreover, the table on the Data Curation webpage is helpful to understand what is being summarized and compared (is it a within participant fold change [pre-post] or does it account for placebo response?). That table is also helpful to understand which datapoints reflect summaries from analyses that involved only steroid-naive patients vs. those that also reflect summaries from datasets which did not adjust for treatment/steroids.

Why are certain Somascan aptamers labeled with a *?

Some somamers recognise two proteins linked to two UNIPROT IDs, e.g., both CAPN1 and CAPNS1 are detected by the somamer 2668-70. This is marker with a * so that users are aware when results from different studies are compared.

Why are different targets/fragments of biomarkers sometimes grouped together?

Whenever available, additional information is available in the app on a biomarker via mouse crossover/tooltip, which may assist in differentiating between different targets or fragments. Note that the grouping can happen due to several reasons:

Naming conventions can differ across published datasets (e.g., the target name for osteocalcin being shortened to OSTEOC OR OC). The UniProt and EntrezGene ID are standardized, allowing for automated merging of datasets.
Some technologies may provide quantification of different targets, isoforms, etc., but these are linked to the same Uniprot ID. Furthermore, some datasets provide only a UniProt ID and/or EntrezGene ID and do not differentiate between different targets, e.g., Somascan datasets may use multiple aptamers for the same protein target; similarly, Affymetrix gene chips may use multiple different probes for the same mRNA target.

How are heterodimers represented?

For example, in Somascan datasets, some aptamers detect functional dimeric complexes, e.g., IL12B-IL23A and TSHB-CGA are heterodimer complexes. These are associated with multiple Uniprot IDs and can be searched accordingly. For example, the target full name from a Somascan dataset has Target as IL-23, IL12B-IL23A as the corresponding EntrezGeneSymbol, and P29460, Q9NPF7 for Uniprot. Summary findings can be obtained from the application by searching for IL23A or Q9NPF7; or by searching for IL12B or P29460.

Why is it that one study shows that a biomarker is treatment responsive, but another study doesn’t?

There could be many reasons, e.g., different age range, different treatment status (corticosteroid-naive or corticosteroid-treated) or regimen (daily treatment vs intermittent, etc), different technology, less statistical power, etc. We suggest not focusing on the p-value alone, but also looking at the fold changes. If the fold changes across multiple studies with the same technology and same cohort characteristics are generally consistent, a non-significant p-value in a specific study likely reflects lack of power due to small sample size.

Why do some numbers look slightly different than what was reported in the previously published papers?

When raw data was available, the data was reanalyzed following a common template (log2 transformation, normalization, filtering, statistical methodology implementation). There should not be much of a change on fold change compared to what is published, but p-values and adjusted p-values can indeed differ.

Why is a biomarker that was published in XXX paper not available through this resource?

The primary focus of this database was on less invasive or minimally invasive markers, primarily obtained through serum samples. So, biopsy based markers were retained if they could be linked to some findings from serum samples. So, something like MYH3, which is substantially increased in DMD muscle vs healthy control muscle, is not present in these data if these markers were not measured in serum samples in existing research efforts. This may change in the future if we include data from a paper which measured MYH3. Furthermore, there is an over representation of markers in this database from markers measured via large scale screens (e.g., Somalogic assays, TMT). Currently, we are also focused on compiling evidence in relatively more common DMD states/scenarios; in the future, other treatments, animal models, muscular dystrophies may be supported.

I cannot find the biomarker I am looking for. What could be going wrong?

Start by consulting the Tutorial webpage to learn more about how to search for a biomarker. Also, see the answer above.

What are the requirements for a dataset to be added?

Peer-reviewed findings are easiest to incorporate into the application, and to cite. If a pre-print is available, this is OK as well as it can be cited. Please reach out to Utkarsh Dang at utkarshdang@cunet.carleton.ca.

How often is the interactive application updated with new data?

The app displays when was the last time the app was updated on the Overview table view.

Can I download data from the website?

This is planned to be added at a future time.

How is association with DMD vs healthy controls determined?

Samples (either serum or tissue) are collected from patients with DMD and healthy controls. An association with DMD is determined if biomarker levels in samples differ significantly between DMD samples vs unaffected controls. This comparison might be nuanced though; for example, if a dataset did not differentiate between steroid-treated and steroid-naive boys, so a comparison might suggest differences between DMD and controls, but it could also be simply a treatment responsive marker. We have compiled multiple datasets, which were only on steroid-naive boys; these important attributes are listed on the Data Curation webpage.

How is association with glucocorticoid treatment determined?

There are multiple experimental design structures that can allow for this association to be determined, three of which are a) samples (either serum or tissue) are collected from patients with DMD before and after treatment, allowing for a pre-post comparison; b) patients with DMD in a trial are assigned to a placebo group and some patients assigned to one or more treatment groups; in this case, change in biomarker levels between the treated group and the placebo group is studied c) samples from a natural history study where pre-post change in DMD patients is compared to untreated DMD controls. Note that different studies have different attributes in terms of time frame of treatment response evaluation (acute/short term vs chronic/long term, daily or weekend or intermittent dose regimen, age range, etc). Such important attributes are either available via a mouseover/tooltip on the app, or listed on the Data Curation page with citations to the original manuscripts, which provide complete details.

How is association with age determined?

This is evaluated primarily via longitudinal models to judge if over time a biomarker was associated with a systematic change (increase or decrease) or not. Make sure to assess whether it is a treatment-responsive biomarker or not, in which case, the change over time in DMD may reflect change due to treatment exposure.

Why use adjusted p-values? Why not use raw p-values given that multiple papers are being compared?

The manuscripts associated with these datasets used adjusted p-values for the most part to control Type 1 errors. We do recommend looking at the consistency of directionality, fold change, and both p-values (raw and adjusted; raw p-values are provided when available) in tandem.

Are there p-values for the correlations with clinical outcomes?

P-values are not currently provided for the correlations with clinical outcomes due to multiple testing correction concerns (many outcomes, many biomarkers).

What are the different technologies used to quantify biomarkers that are currently included on the application?

Somalogic Somascan
Tandem mass tag mass spectrometry (TMT)
Affymetrix human genome arrays
RT-PCR
We plan to add ELISA, MRM MS, Luminex, etc. in the future.

Which treatments are included in the treatment response subpanel?

Currently, only glucocorticoid treatments are included: prednisone, deflazacort, etc. The measurements leading to the contrasts/analysis might be from patients on different dose regimens, with different lifetime exposures, etc. Some important attributes are listed on the Data Curation page with citations to the original manuscripts, which provide complete details. More treatments will be added in the future.