Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment

Background In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systematic differences between these batches not attributable to the biological signal of interest are denoted as batch effects. If ignored when conducting analyses on the combined data, batch effects can lead to distortions in the results. In this paper we present FAbatch, a general, model-based method for correcting for such batch effects in the case of an analysis involving a binary target variable. It is a combination of two commonly used approaches: location-and-scale adjustment and data cleaning by adjustment for distortions due to latent factors. We compare FAbatch extensively to the most commonly applied competitors on the basis of several performance metrics. FAbatch can also be used in the context of prediction modelling to eliminate batch effects from new test data. This important application is illustrated using real and simulated data. We implemented FAbatch and various other functionalities in the R package bapred available online from CRAN. Results FAbatch is seen to be competitive in many cases and above average in others. In our analyses, the only cases where it failed to adequately preserve the biological signal were when there were extremely outlying batches and when the batch effects were very weak compared to the biological signal. Conclusions As seen in this paper batch effect structures found in real datasets are diverse. Current batch effect adjustment methods are often either too simplistic or make restrictive assumptions, which can be violated in real datasets. Due to the generality of its underlying model and its ability to perform well FAbatch represents a reliable tool for batch effect adjustment for most situations found in practice. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0870-z) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-015-0870-z
PID https://www.doi.org/10.5282/ubm/epub.25331
PID pmc:PMC4710051
PID pmid:26753519
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710051
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0870-z
URL https://hal.archives-ouvertes.fr/hal-01286550
URL http://nbn-resolving.de/urn:nbn:de:bvb:19-epub-33092-5
URL https://epub.ub.uni-muenchen.de/25331/
URL https://epub.ub.uni-muenchen.de/33092/1/Hornung_Boulesteix_Causeur_Combining_location-and-scale_batch_effect_adjustment.pdf
URL https://epub.ub.uni-muenchen.de/25331/1/TR.pdf
URL http://dx.doi.org/10.5282/ubm/epub.25331
URL http://europepmc.org/articles/PMC4710051
URL https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-015-0870-z
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi17.html#HornungBC16
URL https://www.mendeley.com/catalogue/d249ab56-d38c-3700-a8fc-8efa2e0ed924/
URL https://link.springer.com/article/10.1186/s12859-015-0870-z
URL https://dx.doi.org/10.1186/s12859-015-0870-z
URL https://epub.ub.uni-muenchen.de/33092/
URL https://doi.org/10.1186/s12859-015-0870-z
URL http://link.springer.com/content/pdf/10.1186/s12859-015-0870-z
URL http://nbn-resolving.de/urn:nbn:de:bvb:19-epub-25331-3
URL http://dx.doi.org/10.1186/s12859-015-0870-z
URL https://academic.microsoft.com/#/detail/1772250503
URL https://epub.ub.uni-muenchen.de/25331/index.html
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Anne-Laure Boulesteix, 0000-0002-2729-0947
Contributor University of Leipzig ; Department of Medica
Contributor AGROCAMPUS OUEST
Contributor Laboratoire de Mathématiques Appliquées Agrocampus ( LMA2 ) ; AGROCAMPUS OUEST
Contributor Institut de Recherche Mathématique de Rennes ( IRMAR ) ; Université de Rennes 1 ( UR1 ) ; Université de Rennes ( UNIV-RENNES ) -Université de Rennes ( UNIV-RENNES ) -AGROCAMPUS OUEST-École normale supérieure - Rennes ( ENS Rennes ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National des Sciences Appliquées ( INSA ) -Université de Rennes 2 ( UR2 ) ; Université de Rennes ( UNIV-RENNES ) -Centre National de la Recherche Scientifique ( CNRS )
Contributor Universität Leipzig [Leipzig]
Contributor AGROCAMPUS OUEST ; Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
Contributor Laboratoire de Mathématiques Appliquées Agrocampus (LMA2) ; AGROCAMPUS OUEST ; Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
Contributor Institut de Recherche Mathématique de Rennes (IRMAR) ; Institut National des Sciences Appliquées - Rennes (INSA Rennes) ; Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Rennes (ENS Rennes)-Université de Rennes 2 (UR2) ; Université de Rennes (UNIV-RENNES)-Université de Rennes 1 (UR1) ; Université de Rennes (UNIV-RENNES)-AGROCAMPUS OUEST ; Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
Contributor Laboratoire de Mathématiques Appliquées Agrocampus (LMA2) ; AGROCAMPUS OUEST
Contributor Institut de Recherche Mathématique de Rennes (IRMAR) ; Institut National des Sciences Appliquées - Rennes (INSA Rennes) ; Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Rennes (ENS Rennes)-Université de Rennes 2 (UR2) ; Université de Rennes (UNIV-RENNES)-Université de Rennes 1 (UR1) ; Université de Rennes (UNIV-RENNES)-AGROCAMPUS OUEST
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From HAL-Inserm; ORCID; Datacite; INRIA a CCSD electronic archive server; HAL Descartes; HAL-Rennes 1; Mémoires en Sciences de l'Information et de la Communication; Open Access LMU; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator); Europe PubMed Central; PubMed Central; UnpayWall; HAL-Pasteur; Hyper Article en Ligne; Crossref
Hosted By Europe PubMed Central; SpringerOpen; HAL-Inserm; INRIA a CCSD electronic archive server; HAL Descartes; HAL-Pasteur; HAL-Rennes 1; Hyper Article en Ligne; BMC Bioinformatics; Mémoires en Sciences de l'Information et de la Communication; Open Access LMU
Journal BMC Bioinformatics, 17, 1
Publication Date 2016-12-01
Publisher Springer Nature
Additional Info
Field Value
Country France; Germany
Format application/pdf
Language English
Resource Type Other literature type; Article; UNKNOWN; Research
keyword ddc.ddc:610
keyword ddc.ddc:500
keyword Batch effects, High-dimensional data, Data preparation, Prediction, Latent factors
keyword Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::96c5db4cd8ac6fe5fe3f426e8d719115
Author jsonws_user
Last Updated 25 December 2020, 13:09 (CET)
Created 25 December 2020, 13:09 (CET)