Drug-target interaction prediction via class imbalance-aware ensemble learning

Background Multiple computational methods for predicting drug-target interactions have been developed to facilitate the drug discovery process. These methods use available data on known drug-target interactions to train classifiers with the purpose of predicting new undiscovered interactions. However, a key challenge regarding this data that has not yet been addressed by these methods, namely class imbalance, is potentially degrading the prediction performance. Class imbalance can be divided into two sub-problems. Firstly, the number of known interacting drug-target pairs is much smaller than that of non-interacting drug-target pairs. This imbalance ratio between interacting and non-interacting drug-target pairs is referred to as the between-class imbalance. Between-class imbalance degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Secondly, there are multiple types of drug-target interactions in the data with some types having relatively fewer members (or are less represented) than others. This variation in representation of the different interaction types leads to another kind of imbalance referred to as the within-class imbalance. In within-class imbalance, prediction results are biased towards the better represented interaction types, leading to more prediction errors in the less represented interaction types. Results We propose an ensemble learning method that incorporates techniques to address the issues of between-class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. In addition, we simulated cases for new drugs and targets to see how our method would perform in predicting their interactions. New drugs and targets are those for which no prior interactions are known. Our method displayed satisfactory prediction performance and was able to predict many of the interactions successfully. Conclusions Our proposed method has improved the prediction performance over the existing work, thus proving the importance of addressing problems pertaining to class imbalance in the data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1377-y) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-016-1377-y
PID pmid:28155697
PID pmc:PMC5259867
URL http://europepmc.org/articles/PMC5259867
URL https://www.ncbi.nlm.nih.gov/pubmed/28155697
URL http://link.springer.com/content/pdf/10.1186/s12859-016-1377-y.pdf
URL https://dx.doi.org/10.1186/s12859-016-1377-y
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi17S.html#EzzatW0K16
URL https://core.ac.uk/display/81072637
URL https://dr.ntu.edu.sg/handle/10220/46173
URL https://academic.microsoft.com/#/detail/2563206593
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1377-y
URL https://link.springer.com/article/10.1186/s12859-016-1377-y
URL https://doi.org/10.1186/s12859-016-1377-y
URL http://dx.doi.org/10.1186/s12859-016-1377-y
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Ali Ezzat, 0000-0002-7426-3888
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; ORCID; Datacite; UnpayWall; Crossref; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator)
Hosted By Europe PubMed Central; SpringerOpen; BMC Bioinformatics
Publication Date 2016-12-22
Publisher Springer Science and Business Media LLC
Additional Info
Field Value
Language UNKNOWN
Resource Type Other literature type; Article; UNKNOWN
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::c035a84c51e437ce12d974c605411824
Author jsonws_user
Last Updated 24 December 2020, 21:00 (CET)
Created 24 December 2020, 21:00 (CET)