TCGA2BED: Extracting, extending, integrating, and querying The Cancer Genome Atlas

Background Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types. Results We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format. Conclusions The availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1419-5) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-016-1419-5
PID pmid:28049410
PID pmc:PMC5210259
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi18.html#CumboFCMW17
URL https://dx.doi.org/10.1186/s12859-016-1419-5
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1419-5
URL https://re.public.polimi.it/handle/11311/1013720
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210259/
URL http://link.springer.com/content/pdf/10.1186/s12859-016-1419-5.pdf
URL https://0-bmcbioinformatics-biomedcentral-com.brum.beds.ac.uk/articles/10.1186/s12859-016-1419-5
URL https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1419-5
URL https://core.ac.uk/display/80336304
URL https://paperity.org/p/78682851/tcga2bed-extracting-extending-integrating-and-querying-the-cancer-genome-atlas
URL https://re.public.polimi.it/bitstream/11311/1013720/1/art%253A10.1186%252Fs12859-016-1419-5.pdf
URL https://academic.microsoft.com/#/detail/2566169553
URL https://doi.org/10.1186/s12859-016-1419-5
URL https://link.springer.com/article/10.1186/s12859-016-1419-5
URL http://hdl.handle.net/11311/1013720
URL http://dx.doi.org/10.1186/s12859-016-1419-5
URL http://europepmc.org/articles/PMC5210259
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Emanuel Weitschek, 0000-0002-8045-2925
Author Fabio Cumbo, 0000-0003-2920-5838
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; ORCID; UnpayWall; Datacite; RE.PUBLIC@POLIMI Research Publications at Politecnico di Milano; Crossref; Microsoft Academic Graph; Sygma; CORE (RIOXX-UK Aggregator)
Hosted By Europe PubMed Central; SpringerOpen; RE.PUBLIC@POLIMI Research Publications at Politecnico di Milano; BMC Bioinformatics
Publication Date 2017-01-03
Additional Info
Field Value
Country Italy
Language English
Resource Type Article; UNKNOWN
keyword INF; bioinformatics
keyword Cancer; Data extraction; Data integration; Knowledge extraction; Structural Biology; Biochemistry; Molecular Biology; Computer Science Applications1707 Computer Vision and Pattern Recognition; Applied Mathematics
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::1610db10a0899f01e827189086d0ea11
Author jsonws_user
Last Updated 25 December 2020, 09:59 (CET)
Created 25 December 2020, 09:59 (CET)