Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework

Background In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution. Results We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment. Conclusions We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true configuration even when many individuals were missing in several data tables. This method takes into account the uncertainty of MI-MFA configurations induced by the missing rows, thereby allowing the reliability of the results to be evaluated. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1273-5) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-016-1273-5
PID pmc:PMC5048483
PID pmid:27716030
URL http://dx.doi.org/10.1186/s12859-016-1273-5
URL https://link.springer.com/article/10.1186/s12859-016-1273-5
URL https://academic.microsoft.com/#/detail/2529746041
URL https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1273-5
URL https://hal.inrae.fr/hal-02636549
URL https://dx.doi.org/10.1186/s12859-016-1273-5
URL https://doi.org/10.1186/s12859-016-1273-5
URL https://hal.inrae.fr/hal-02636549/document
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1273-5
URL http://europepmc.org/articles/PMC5048483
URL https://www.ncbi.nlm.nih.gov/pubmed/27716030
URL https://core.ac.uk/display/81889253
URL http://prodinra.inra.fr/record/377761
URL https://hal.archives-ouvertes.fr/hal-01957574
URL https://paperity.org/p/77860301/handling-missing-rows-in-multi-omics-data-integration-multiple-imputation-in-multiple
URL http://europepmc.org/abstract/MED/27716030
URL https://hal.inrae.fr/hal-02636549/file/2016_Voillet_pdf_1
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi17.html#VoilletBLCG16
URL http://link.springer.com/content/pdf/10.1186/s12859-016-1273-5.pdf
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Valentin Voillet
Author Philippe Besse
Author Laurence Liaubet, 0000-0003-0201-0264
Author Magali San Cristobal
Author Ignacio González, 0000-0003-3432-6909
Contributor Génétique Physiologie et Systèmes d'Elevage (GenPhySE ) ; Institut National de la Recherche Agronomique (INRA)-Ecole Nationale Vétérinaire de Toulouse (ENVT) ; Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-École nationale supérieure agronomique de Toulouse [ENSAT]
Contributor Institut de Mathématiques de Toulouse UMR5219 (IMT) ; Institut National des Sciences Appliquées - Toulouse (INSA Toulouse) ; Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3) ; Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)
Contributor UE 1372 Génétique, Expérimentation et Système Innovants ; Institut National de la Recherche Agronomique (INRA)-Génétique animale (G.A.)-Physiologie Animale et Systèmes d'Elevage (PHASE) ; Institut National de la Recherche Agronomique (INRA)-Génétique, Expérimentation et Système Innovants (GenESI)
Contributor Génétique Physiologie et Systèmes d'Elevage (GenPhySE ) ; École nationale supérieure agronomique de Toulouse [ENSAT]-Institut National de la Recherche Agronomique (INRA)-Ecole Nationale Vétérinaire de Toulouse (ENVT) ; Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées
Contributor Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université Fédérale Toulouse Midi-Pyrénées
Contributor Unité de Mathématiques et Informatique Appliquées de Toulouse (MIAT INRA) ; Institut National de la Recherche Agronomique (INRA)
Contributor Génétique, Expérimentation et Système Innovants (GenESI) ; Institut National de la Recherche Agronomique (INRA)
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From HAL-Inserm; ORCID; Datacite; INRIA a CCSD electronic archive server; Hal-Diderot; HAL - UPEC / UPEM; Mémoires en Sciences de l'Information et de la Communication; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator); Europe PubMed Central; PubMed Central; UnpayWall; HAL-INSA Toulouse; HAL-Pasteur; ProdInra; Crossref; Hyper Article en Ligne
Hosted By HAL-Inserm; INRIA a CCSD electronic archive server; Hal-Diderot; HAL - UPEC / UPEM; BMC Bioinformatics; Mémoires en Sciences de l'Information et de la Communication; Europe PubMed Central; SpringerOpen; HAL-INSA Toulouse; HAL-Pasteur; ProdInra; Hyper Article en Ligne
Publication Date 2016-12-01
Publisher HAL CCSD
Additional Info
Field Value
Country France
Format application/pdf
Language English
Resource Type Article; UNKNOWN
keyword multiple omics data integration;multivariate factor analysis;missing individual;multiple imputation;hot-deck imputation;canonical correlation-analysis;fully conditional specification;discrete;package
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::b84e8c82bec5ad85752892e63f3257c3
Author jsonws_user
Last Updated 23 December 2020, 00:57 (CET)
Created 23 December 2020, 00:57 (CET)