ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research

Background Traditional Sanger sequencing has been used as a gold standard method for genetic testing in clinic to perform single gene test, which has been a cumbersome and expensive method to test several genes in heterogeneous disease such as cancer. With the advent of Next Generation Sequencing technologies, which produce data on unprecedented speed in a cost effective manner have overcome the limitation of Sanger sequencing. Therefore, for the efficient and affordable genetic testing, Next Generation Sequencing has been used as a complementary method with Sanger sequencing for disease causing mutation identification and confirmation in clinical research. However, in order to identify the potential disease causing mutations with great sensitivity and specificity it is essential to ensure high quality sequencing data. Therefore, integrated software tools are lacking which can analyze Sanger and NGS data together and eliminate platform specific sequencing errors, low quality reads and support the analysis of several sample/patients data set in a single run. Results We have developed ClinQC, a flexible and user-friendly pipeline for format conversion, quality control, trimming and filtering of raw sequencing data generated from Sanger sequencing and three NGS sequencing platforms including Illumina, 454 and Ion Torrent. First, ClinQC convert input read files from their native formats to a common FASTQ format and remove adapters, and PCR primers. Next, it split bar-coded samples, filter duplicates, contamination and low quality sequences and generates a QC report. ClinQC output high quality reads in FASTQ format with Sanger quality encoding, which can be directly used in down-stream analysis. It can analyze hundreds of sample/patients data in a single run and generate unified output files for both Sanger and NGS sequencing data. Our tool is expected to be very useful for quality control and format conversion of Sanger and NGS data to facilitate improved downstream analysis and mutation screening. Conclusions ClinQC is a powerful and easy to handle pipeline for quality control and trimming in clinical research. ClinQC is written in Python with multiprocessing capability, run on all major operating systems and is available at https://sourceforge.net/projects/clinqc. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0915-y) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-016-0915-y
PID pmc:PMC4735967
PID pmid:26830926
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4735967/
URL https://link.springer.com/article/10.1186%2Fs12859-016-0915-y
URL https://doi.org/10.1186/s12859-016-0915-y
URL https://paperity.org/p/219450268/clinqc-a-tool-for-quality-control-and-cleaning-of-sanger-and-ngs-data-in-clinical
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi17.html#PandeyPKW16
URL https://dx.doi.org/10.1186/s12859-016-0915-y
URL https://academic.microsoft.com/#/detail/2272917943
URL http://dx.doi.org/10.1186/s12859-016-0915-y
URL https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-0915-y
URL https://pubmed.ncbi.nlm.nih.gov/26830926/
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0915-y
URL http://europepmc.org/articles/PMC4735967
URL https://core.ac.uk/display/81051276
URL http://link.springer.com/content/pdf/10.1186/s12859-016-0915-y
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Stephan Pabinger, 0000-0001-9876-5965
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; ORCID; UnpayWall; Datacite; Crossref; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator)
Hosted By Europe PubMed Central; SpringerOpen; BMC Bioinformatics
Journal BMC Bioinformatics, 17, 1
Publication Date 2016-02-02
Publisher Springer Nature
Additional Info
Field Value
Language English
Resource Type Other literature type; Article; UNKNOWN
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::9810a573b9b4bc150dfd17601baa8aa7
Author jsonws_user
Last Updated 26 December 2020, 11:38 (CET)
Created 26 December 2020, 11:38 (CET)