Imbalanced target prediction with pattern discovery on clinical data repositories

Background Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice. Methods We propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed. Results We compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant (p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance. Conclusions Pattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies. Electronic supplementary material The online version of this article (doi:10.1186/s12911-017-0443-3) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12911-017-0443-3
PID pmc:PMC5399417
PID pmid:28427384
URL https://bmcmedinformdecismak.biomedcentral.com/track/pdf/10.1186/s12911-017-0443-3
URL https://academic.microsoft.com/#/detail/2606358176
URL https://link.springer.com/article/10.1186/s12911-017-0443-3
URL https://core.ac.uk/display/155747772
URL http://europepmc.org/articles/PMC5399417
URL http://dx.doi.org/10.1186/s12911-017-0443-3
URL http://link.springer.com/content/pdf/10.1186/s12911-017-0443-3.pdf
URL https://dblp.uni-trier.de/db/journals/midm/midm17.html#ChanLCZJH17
URL http://link.springer.com/article/10.1186/s12911-017-0443-3
URL https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-017-0443-3
URL https://doaj.org/toc/1472-6947
URL https://dx.doi.org/10.1186/s12911-017-0443-3
URL https://www.ncbi.nlm.nih.gov/pubmed/28427384
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Tak-Ming Chan
Author Yuxi Li
Author Choo-Chiap Chiau
Author Jane Zhu
Author Jie Jiang
Author Yong Huo
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; Datacite; UnpayWall; DOAJ-Articles; Crossref; Microsoft Academic Graph
Hosted By Europe PubMed Central; BMC Medical Informatics and Decision Making
Journal BMC Medical Informatics and Decision Making, 17,
Publication Date 2017-04-20
Publisher BioMed Central
Additional Info
Field Value
Language English
Resource Type Other literature type; Article; UNKNOWN
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::b497a99308cf50dc083da6ed23c74e43
Author jsonws_user
Last Updated 23 December 2020, 03:51 (CET)
Created 23 December 2020, 03:51 (CET)