Modelling of ready biodegradability based on combined public and industrial data sources

The European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB). In-silico prediction is a valid alternative to expensive and time-consuming experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues. In this work we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (BA = 0.74 – 0.79) and data coverage (83 – 91 %). The Generative Topographic Mapping approach was employed to compare the chemical space of the various data sources: several chemotypes and structural motifs unique to the industrial dataset were identified, highlighting for which chemical classes currently available models may have less reliable predictions. Finally, public and industrial data were merged into Global dataset containing 3146 compounds and including a significant subset of compounds coming from the industrial context. This is the biggest dataset reported in the literature so far which covers some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has much larger applicability domain than related models built on publicly available data. The developed model is available for the user on the Laboratory of Chemoinformatics website. This dataset is only the "All-Public" set, since the industrial compounds cannot be disclosed. This update contains additional entries from [J. Chem. Inf. Model. 52 (2012), pp. 655–669] and [J. Chem. Inf. Model. 53 (2013), pp. 867–878]

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.5281/zenodo.3466619
PID https://www.doi.org/10.5281/zenodo.3540701
PID https://www.doi.org/10.5281/zenodo.3466618
PID https://www.doi.org/10.6084/m9.figshare.11417067
URL https://zenodo.org/record/3540701
URL https://figshare.com/articles/Modelling_of_ready_biodegradability_based_on_combined_public_and_industrial_data_sources/11536887
URL https://dx.doi.org/10.6084/m9.figshare.11417067
URL https://figshare.com/articles/Modelling_of_ready_biodegradability_based_on_combined_public_and_industrial_data_sources/11474097
URL http://dx.doi.org/10.5281/zenodo.3466619
URL https://zenodo.org/record/3466619
URL http://dx.doi.org/10.5281/zenodo.3466618
URL http://dx.doi.org/10.5281/zenodo.3540701
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Lunghini, Filippo
Author Marcou, Gilles, 0000-0003-1676-6708
Author Gantzer, Philippe
Author Azam, Philippe
Author Horvath, Dragos, 0000-0003-0173-5714
Author Van Miert, Erik
Author Varnek, Alexandre, 0000-0003-1886-925X
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Zenodo; figshare; Datacite
Hosted By Zenodo; figshare
Publication Date 2019-10-01
Publisher Figshare
Additional Info
Field Value
Language UNKNOWN
Resource Type Dataset
system:type dataset
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/dataset?datasetId=dedup_wf_001::aa8c6f044bfa413178060d83b33b563f
Author jsonws_user
Last Updated 13 January 2021, 15:10 (CET)
Created 13 January 2021, 15:10 (CET)