"A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation"

Background Branching events in phylogenetic trees reflect bifurcating and/or multifurcating speciation and splitting events. In the presence of gene flow, a phylogeny cannot be described by a tree but is instead a directed acyclic graph known as a phylogenetic network. Both phylogenetic trees and networks are typically reconstructed using computational analysis of multi-locus sequence data. The advent of high-throughput sequencing technologies has brought about two main scalability challenges: (1) dataset size in terms of the number of taxa and (2) the evolutionary divergence of the taxa in a study. The impact of both dimensions of scale on phylogenetic tree inference has been well characterized by recent studies; in contrast, the scalability limits of phylogenetic network inference methods are largely unknown. Results In this study, we quantify the performance of state-of-the-art phylogenetic network inference methods on large-scale datasets using empirical data sampled from natural mouse populations and a range of simulations using model phylogenies with a single reticulation. We find that, as in the case of phylogenetic tree inference, the performance of leading network inference methods is negatively impacted by both dimensions of dataset scale. In general, we found that topological accuracy degrades as the number of taxa increases; a similar effect was observed with increased sequence mutation rate. The most accurate methods were probabilistic inference methods which maximize either likelihood under coalescent-based models or pseudo-likelihood approximations to the model likelihood. The improved accuracy obtained with probabilistic inference methods comes at a computational cost in terms of runtime and main memory usage, which become prohibitive as dataset size grows past twenty-five taxa. None of the probabilistic methods completed analyses of datasets with 30 taxa or more after many weeks of CPU runtime. Conclusions We conclude that the state of the art of phylogenetic network inference lags well behind the scope of current phylogenomic studies. New algorithmic development is critically needed to address this methodological gap. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1277-1) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-016-1277-1
PID pmid:27737628
PID pmc:PMC5064893
URL http://dx.doi.org/10.1186/s12859-016-1277-1
URL https://link.springer.com/article/10.1186/s12859-016-1277-1
URL https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1277-1
URL https://link.springer.com/content/pdf/10.1186%2Fs12859-016-1277-1.pdf
URL http://link.springer.com/article/10.1186/s12859-016-1277-1/fulltext.html
URL https://dx.doi.org/10.1186/s12859-016-1277-1
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi17.html#HejaseL16
URL https://core.ac.uk/display/81632183
URL https://doi.org/10.1186/s12859-016-1277-1
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1277-1
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5064893
URL https://academic.microsoft.com/#/detail/2531578752
URL https://paperity.org/p/78316854/a-scalability-study-of-phylogenetic-network-inference-methods-using-empirical-datasets
URL http://link.springer.com/content/pdf/10.1186/s12859-016-1277-1.pdf
URL http://europepmc.org/articles/PMC5064893
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Hejase, Hussein A.
Author Liu, Kevin J.
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; Datacite; UnpayWall; Crossref; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator)
Hosted By Europe PubMed Central; SpringerOpen; BMC Bioinformatics
Publication Date 2016-10-13
Publisher Springer Nature
Additional Info
Field Value
Language Undetermined
Resource Type Other literature type; Article; UNKNOWN
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::243ea0158cae4d8b253e6a4e8847b3fa
Author jsonws_user
Last Updated 26 December 2020, 05:37 (CET)
Created 26 December 2020, 05:37 (CET)