An integrated framework for the identification of potential miRNA-disease association based on novel negative samples extraction strategy

MicroRNAs (miRNAs) play an important role in prevention, diagnosis and treatment of human complex diseases. Predicting potential miRNA-disease associations could provide important prior information for medical researchers. Therefore, reliable computational models are expected to be an effective supplement for inferring associations between miRNAs and diseases. In this study, we developed a novel calculative model named Negative Samples Extraction based MiRNA-Disease Association prediction (NSEMDA). NSEMDA filtered reliable negative samples by two positive-unlabeled learning models, namely, the Spy and Rocchio techniques and calculated similarity weights for ambiguous samples. The positive samples, reliable negative samples and ambiguous samples with similarity weights were used to construct a Support Vector Machine-Similarity Weight model to predict miRNA-disease associations. NSEMDA improved the credibility of negative samples and reduced the impact of noise samples by introducing ambiguous samples with similarity weights to train prediction model. As a result, NSEMDA achieved the AUC of 0.8899 in global leave-one-out cross validation (LOOCV) and AUC of 0.8353 under local LOOCV. In 100 times 5-fold cross validation, NSEMDA obtained an average AUC of 0.8878 and standard deviation of 0.0014. These AUCs are higher than many classical models. Besides, we also carried out three kinds of case studies to evaluate the performance of NSEMDA. Among the top 50 potential related miRNAs of esophageal neoplasms, lung neoplasms and carcinoma hepatocellular predicted by NSEMDA, 46, 50 and 45 miRNAs were verified to be associated with the investigated disease by experimental evidences, respectively. Therefore, NSEMDA would be a reliable calculative model for inferring miRNA-disease associations.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.6084/m9.figshare.7593017
PID https://www.doi.org/10.6084/m9.figshare.7593017.v1
URL https://figshare.com/articles/An_integrated_framework_for_the_identification_of_potential_miRNA-disease_association_based_on_novel_negative_samples_extraction_strategy/7593017
URL https://dx.doi.org/10.6084/m9.figshare.7593017
URL https://dx.doi.org/10.6084/m9.figshare.7593017.v1
URL http://dx.doi.org/10.6084/m9.figshare.7593017
URL http://dx.doi.org/10.6084/m9.figshare.7593017.v1
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Chun-Chun Wang
Author Chen, Xing
Author Yin, Jun
Author Qu, Jia
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Datacite; figshare
Hosted By figshare
Publication Date 2019-01-16
Publisher Figshare
Additional Info
Field Value
Language Undetermined
Resource Type Dataset
keyword FOS: Chemical sciences
keyword FOS: Biological sciences
keyword FOS: Computer and information sciences
keyword FOS: Clinical medicine
system:type dataset
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/dataset?datasetId=dedup_wf_001::d3b7399909ab1acdeb0224c29cd37cf1
Author jsonws_user
Last Updated 14 January 2021, 13:07 (CET)
Created 14 January 2021, 13:07 (CET)