Risk-conscious correction of batch effects: maximising information extraction from high-throughput genomic datasets

Background Batch effects are a persistent and pervasive form of measurement noise which undermine the scientific utility of high-throughput genomic datasets. At their most benign, they reduce the power of statistical tests resulting in actual effects going unidentified. At their worst, they constitute confounds and render datasets useless. Attempting to remove batch effects will result in some of the biologically meaningful component of the measurement (i.e. signal) being lost. We present and benchmark a novel technique, called Harman. Harman maximises the removal of batch noise with the constraint that the risk of also losing biologically meaningful component of the measurement is kept to a fraction which is set by the user. Results Analyses of three independent publically available datasets reveal that Harman removes more batch noise and preserves more signal at the same time, than the current leading technique. Results also show that Harman is able to identify and remove batch effects no matter what their relative size compared to other sources of variation in the dataset. Of particular advantage for meta-analyses and data integration is Harman’s superior consistency in achieving comparable noise suppression - signal preservation trade-offs across multiple datasets, with differing number of treatments, replicates and processing batches. Conclusion Harman’s ability to better remove batch noise, and better preserve biologically meaningful signal simultaneously within a single study, and maintain the user-set trade-off between batch noise rejection and signal preservation across different studies makes it an effective alternative method to deal with batch effects in high-throughput genomic datasets. Harman is flexible in terms of the data types it can process. It is available publically as an R package (https://bioconductor.org/packages/release/bioc/html/Harman.html), as well as a compiled Matlab package (http://www.bioinformatics.csiro.au/harman/) which does not require a Matlab license to run. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1212-5) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-016-1212-5
PID pmid:27585881
PID pmc:PMC5009651
URL https://core.ac.uk/display/81279172
URL https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1212-5
URL https://link.springer.com/article/10.1186/s12859-016-1212-5
URL https://doi.org/10.1186/s12859-016-1212-5
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009651
URL https://0-bmcbioinformatics-biomedcentral-com.brum.beds.ac.uk/articles/10.1186/s12859-016-1212-5
URL https://dx.doi.org/10.1186/s12859-016-1212-5
URL https://academic.microsoft.com/#/detail/2515569507
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi17.html#OytamSDBOR16
URL http://dx.doi.org/10.1186/s12859-016-1212-5
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1212-5
URL http://link.springer.com/content/pdf/10.1186/s12859-016-1212-5
URL http://europepmc.org/articles/PMC5009651
URL https://paperity.org/p/77813156/risk-conscious-correction-of-batch-effects-maximising-information-extraction-from-high
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Konsta Duesing, 0000-0003-3103-0600
Author Jason Ross, 0000-0003-3268-899X
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; ORCID; UnpayWall; Datacite; Crossref; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator)
Hosted By Europe PubMed Central; SpringerOpen; BMC Bioinformatics
Journal BMC Bioinformatics, 17, null
Publication Date 2016-09-01
Publisher Springer Science and Business Media LLC
Additional Info
Field Value
Language Undetermined
Resource Type Other literature type; Article; UNKNOWN
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::a87bce503555b1fa4cfda03e9ddba8c9
Author jsonws_user
Last Updated 26 December 2020, 15:07 (CET)
Created 26 December 2020, 15:07 (CET)