r37980778c78--89287807c0558934d6d495a01f92839e

Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1371/journal.pone.0162424
URL http://dx.doi.org/10.1371/journal.pone.0162424
URL https://figshare.com/articles/Human_Contamination_in_Public_Genome_Assemblies/3818196
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From figshare
Hosted By figshare
Publication Date 2016-09-10
Additional Info
Field Value
Language UNKNOWN
Resource Type Dataset
system:type dataset
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/dataset?datasetId=r37980778c78::89287807c0558934d6d495a01f92839e
Author jsonws_user
Last Updated 27 December 2020, 18:33 (CET)
Created 27 December 2020, 18:33 (CET)