gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

Unknown sequences, or gaps, are present in many published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while there are many computational tools partially solving the problem, several have shortcomings as to the reliability and correctness of the output, i.e. the gap filled draft genome. SSPACE-LongRead is a scaffolding tool that utilizes long reads from multiple third-generation sequencing platforms in finding links between contigs and combining them. The long reads potentially contain sequence information to fill the gaps created in the scaffolding, but SSPACE-LongRead currently lacks this functionality. We present an automated pipeline called gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. We compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser. We conclude that gapFinisher can fill gaps in draft genomes quickly and reliably. In addition, the serial design of gapFinisher makes it scale well from prokaryote genomes to larger genomes with no increase in the computational footprint.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1371/journal.pone.0216885
URL https://figshare.com/articles/gapFinisher_A_reliable_gap_filling_pipeline_for_SSPACE-LongRead_scaffolder_output/9786293
URL http://dx.doi.org/10.1371/journal.pone.0216885
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Kammonen, Juhana I.
Author Smolander, Olli-Pekka
Author Paulin, Lars
Author Pereira, Pedro A. B.
Author Laine, Pia
Author Koskinen, Patrik
Author Jernvall, Jukka
Author Auvinen, Petri
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From figshare
Hosted By figshare
Publication Date 2019-01-01
Publisher Figshare
Additional Info
Field Value
Language UNKNOWN
Resource Type Dataset
system:type dataset
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/dataset?datasetId=r37980778c78::c95cf8d4c75a7a30a5961c8a387fc042
Author jsonws_user
Last Updated 16 December 2020, 02:40 (CET)
Created 16 December 2020, 02:40 (CET)