Experimental Error, Kurtosis, Activity Cliffs, and Methodology: What Limits the Predictivity of Quantitative Structure–Activity Relationship Models?

Given a particular descriptor/method combination, some quantitative structure–activity relationship (QSAR) datasets are very predictive by random-split cross-validation while others are not. Recent literature in modelability suggests that the limiting issue for predictivity is in the data, not the QSAR methodology, and the limits are due to activity cliffs. Here, we investigate, on in-house data, the relative usefulness of experimental error, distribution of the activities, and activity cliff metrics in determining how predictive a dataset is likely to be. We include unmodified in-house datasets, datasets that should be perfectly predictive based only on the chemical structure, datasets where the distribution of activities is manipulated, and datasets that include a known amount of added noise. We find that activity cliff metrics determine predictivity better than the other metrics we investigated, whatever the type of dataset, consistent with the modelability literature. However, such metrics cannot distinguish real activity cliffs due to large uncertainties in the activities. We also show that a number of modern QSAR methods, and some alternative descriptors, are equally bad at predicting the activities of compounds on activity cliffs, consistent with the assumptions behind “modelability.” Finally, we relate time-split predictivity with random-split predictivity and show that different coverages of chemical space are at least as important as uncertainty in activity and/or activity cliffs in limiting predictivity.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1021/acs.jcim.9b01067.s004
PID https://www.doi.org/10.1021/acs.jcim.9b01067.s003
URL http://dx.doi.org/10.1021/acs.jcim.9b01067.s003
URL http://dx.doi.org/10.1021/acs.jcim.9b01067.s004
URL https://figshare.com/articles/Experimental_Error_Kurtosis_Activity_Cliffs_and_Methodology_What_Limits_the_Predictivity_of_Quantitative_Structure_Activity_Relationship_Models_/12133218
URL https://figshare.com/articles/Experimental_Error_Kurtosis_Activity_Cliffs_and_Methodology_What_Limits_the_Predictivity_of_Quantitative_Structure_Activity_Relationship_Models_/12133215
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Sheridan, Robert P.
Author Karnachi, Prabha
Author Tudor, Matthew
Author Xu, Yuting
Author Liaw, Andy
Author Shah, Falgun
Author Cheng, Alan C.
Author Joshi, Elizabeth
Author Glick, Meir
Author Alvarez, Juan
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From figshare
Hosted By figshare
Publication Date 2020-01-01
Publisher Figshare
Additional Info
Field Value
Language UNKNOWN
Resource Type Dataset
system:type dataset
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/dataset?datasetId=dedup_wf_001::1435bf72166641da80d9ddc7e451eb05
Author jsonws_user
Last Updated 13 January 2021, 16:53 (CET)
Created 13 January 2021, 16:53 (CET)