Ambiguity of non-systematic chemical identifiers within and between small-molecule databases

Background A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers. Results The ambiguity of non-systematic identifiers within databases varied from 0.1 to 15.2 % (median 2.5 %). Standardization reduced the ambiguity only to a small extent for most databases. A wide range of ambiguity existed for non-systematic identifiers that are shared between databases (17.7–60.2 %, median of 40.3 %). Removing stereochemistry information provided the largest reduction in ambiguity across databases (median reduction 13.7 percentage points). Conclusions Ambiguity of non-systematic identifiers within chemical databases is generally low, but ambiguity of non-systematic identifiers that are shared between databases, is high. Chemical structure standardization reduces the ambiguity to a limited extent. Our findings can help to improve database integration, curation, and maintenance. Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0102-6) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s13321-015-0102-6
PID urn:urn:NBN:nl:ui:15-1765/79177
PID handle:1765/79177
PID pmc:PMC4646925
PID pmid:26579214
URL https://jcheminf.biomedcentral.com/articles/10.1186/s13321-015-0102-6
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4646925/
URL http://europepmc.org/articles/PMC4646925
URL https://jcheminf.biomedcentral.com/track/pdf/10.1186/s13321-015-0102-6
URL https://academic.microsoft.com/#/detail/2172533829
URL https://doi.org/10.1186/s13321-015-0102-6
URL http://link.springer.com/article/10.1186/s13321-015-0102-6/fulltext.html
URL https://link.springer.com/article/10.1186/s13321-015-0102-6
URL http://hdl.handle.net/1765/79177
URL https://core.ac.uk/display/86328975
URL http://jcheminf.springeropen.com/articles/10.1186/s13321-015-0102-6
URL https://paperity.org/p/74893645/ambiguity-of-non-systematic-chemical-identifiers-within-and-between-small-molecule
URL http://dx.doi.org/10.1186/s13321-015-0102-6
URL http://link.springer.com/content/pdf/10.1186/s13321-015-0102-6.pdf
URL https://dblp.uni-trier.de/db/journals/jcheminf/jcheminf7.html#AkhondiMWK15
URL https://growkudos.com/publications/10.1186%252Fs13321-015-0102-6/reader
URL https://dx.doi.org/10.1186/s13321-015-0102-6
URL http://link.springer.com/content/pdf/10.1186/s13321-015-0102-6
URL https://repub.eur.nl/pub/79177
URL https://www.narcis.nl/publication/RecordID/oai:repub.eur.nl:79177
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Antony Williams, 0000-0002-2668-4821
Contributor Department of Medical Informatics
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; ORCID; Datacite; UnpayWall; OpenAIRE; NARCIS; Crossref; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator)
Hosted By Erasmus University Institutional Repository; Europe PubMed Central; SpringerOpen; NARCIS; Journal of Cheminformatics
Publication Date 2015-11-16
Publisher Springer Science and Business Media LLC
Additional Info
Field Value
Country Netherlands
Language UNKNOWN
Resource Type Other literature type; Article; UNKNOWN
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::c767ce8fc271698788441889b3b882f1
Author jsonws_user
Last Updated 26 December 2020, 23:25 (CET)
Created 26 December 2020, 23:25 (CET)