Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis

Background A rapidly increasing flow of genomic data requires the development of efficient methods for obtaining its compact representation. Feature extraction facilitates classification, clustering and model analysis for testing and refining biological hypotheses. “Shotgun” metagenome is an analytically challenging type of genomic data - containing sequences of all genes from the totality of a complex microbial community. Recently, researchers started to analyze metagenomes using reference-free methods based on the analysis of oligonucleotides (k-mers) frequency spectrum previously applied to isolated genomes. However, little is known about their correlation with the existing approaches for metagenomic feature extraction, as well as the limits of applicability. Here we evaluated a metagenomic pairwise dissimilarity measure based on short k-mer spectrum using the example of human gut microbiota, a biomedically significant object of study. Results We developed a method for calculating pairwise dissimilarity (beta-diversity) of “shotgun” metagenomes based on short k-mer spectra (5≤k≤11). The method was validated on simulated metagenomes and further applied to a large collection of human gut metagenomes from the populations of the world (n=281). The k-mer spectrum-based measure was found to behave similarly to one based on mapping to a reference gene catalog, but different from one using a genome catalog. This difference turned out to be associated with a significant presence of viral reads in a number of metagenomes. Simulations showed limited impact of bacterial genetic variability as well as sequencing errors on k-mer spectra. Specific differences between the datasets from individual populations were identified. Conclusions Our approach allows rapid estimation of pairwise dissimilarity between metagenomes. Though we applied this technique to gut microbiota, it should be useful for arbitrary metagenomes, even metagenomes with novel microbiota. Dissimilarity measure based on k-mer spectrum provides a wider perspective in comparison with the ones based on the alignment against reference sequence sets. It helps not to miss possible outstanding features of metagenomic composition, particularly related to the presence of an unknown bacteria, virus or eukaryote, as well as to technical artifacts (sample contamination, reads of non-biological origin, etc.) at the early stages of bioinformatic analysis. Our method is complementary to reference-based approaches and can be easily integrated into metagenomic analysis pipelines. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0875-7) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-015-0875-7
PID pmc:PMC4715287
PID pmid:26774270
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0875-7
URL http://europepmc.org/articles/PMC4715287
URL https://academic.microsoft.com/#/detail/2254734290
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi17.html#DubinkinaIUTA16
URL https://0-bmcbioinformatics-biomedcentral-com.brum.beds.ac.uk/articles/10.1186/s12859-015-0875-7
URL https://link.springer.com/article/10.1186/s12859-015-0875-7
URL https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-015-0875-7
URL http://dx.doi.org/10.1186/s12859-015-0875-7
URL https://paperity.org/p/75170720/assessment-of-k-mer-spectrum-applicability-for-metagenomic-dissimilarity-analysis
URL https://dx.doi.org/10.1186/s12859-015-0875-7
URL http://link.springer.com/content/pdf/10.1186/s12859-015-0875-7
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4715287/
URL https://core.ac.uk/display/81203597
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Vladimir Ulyantsev, 0000-0003-0802-830X
Author Alexander Tyakht, 0000-0002-7358-2537
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; ORCID; Datacite; UnpayWall; Crossref; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator)
Hosted By Europe PubMed Central; SpringerOpen; BMC Bioinformatics
Journal BMC Bioinformatics, 17,
Publication Date 2016-01-16
Publisher BioMed Central
Additional Info
Field Value
Language English
Resource Type Article; UNKNOWN
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::3ab039e2a5a1eaff34b7c8eeab64a6d1
Author jsonws_user
Last Updated 26 December 2020, 18:57 (CET)
Created 26 December 2020, 18:57 (CET)