"A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies"

Background Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. Results We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. Conclusion DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1142-2) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-016-1142-2
PID pmid:27363390
PID pmc:PMC4929753
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi17.html#ThakurG16
URL https://link.springer.com/content/pdf/10.1186%2Fs12859-016-1142-2.pdf
URL http://link.springer.com/content/pdf/10.1186/s12859-016-1142-2
URL https://0-bmcbioinformatics-biomedcentral-com.brum.beds.ac.uk/articles/10.1186/s12859-016-1142-2
URL https://core.ac.uk/display/81062562
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2
URL https://link.springer.com/article/10.1186/s12859-016-1142-2
URL http://europepmc.org/articles/PMC4929753
URL https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1142-2
URL http://pubmed.cn/27363390
URL http://dx.doi.org/10.1186/s12859-016-1142-2
URL https://tspace.library.utoronto.ca/bitstream/1807/83115/1/12859_2016_Article_1142.pdf
URL https://tspace.library.utoronto.ca/handle/1807/83115
URL https://paperity.org/p/77299933/a-de-novo-genome-analysis-pipeline-denogap-for-large-scale-comparative-prokaryotic
URL https://dx.doi.org/10.1186/s12859-016-1142-2
URL https://academic.microsoft.com/#/detail/2474345237
URL https://www.ncbi.nlm.nih.gov/pubmed/27363390
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Shalabh Thakur
Author David S. Guttman, 0000-0001-8479-3869
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; Datacite; UnpayWall; Crossref; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator)
Hosted By Europe PubMed Central; SpringerOpen; BMC Bioinformatics
Publication Date 2016-06-01
Publisher Springer Nature
Additional Info
Field Value
Language Undetermined
Resource Type Article; UNKNOWN
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::6b7ea6a00e8eeca15198e2020faa90a9
Author jsonws_user
Last Updated 25 December 2020, 19:52 (CET)
Created 25 December 2020, 19:52 (CET)