Predicting comorbidities of epilepsy patients using big data from Electronic Health Records combined with biomedical knowledge
- Published
- Accepted
- Subject Areas
- Bioinformatics, Data Mining and Machine Learning, Data Science
- Keywords
- Electronic Health Records, Big Data, Machine Learning, Data Mining
- Copyright
- © 2017 Gerlach et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Predicting comorbidities of epilepsy patients using big data from Electronic Health Records combined with biomedical knowledge. PeerJ Preprints 5:e3228v1 https://doi.org/10.7287/peerj.preprints.3228v1
Abstract
Epilepsy is a complex brain disorder characterized by repetitive seizure events. Epilepsy patients often suffer from various and severe physical and psychological comorbidities. While general comorbidity prevalence and incidences can be estimated from epidemiological data, such an approach does not take into account that actual patient specific risks can depend on various individual factors, including medication. This motivates to develop a machine learning approach for predicting individual comorbidities. To address these needs we used Big Data from electronic health records (~100 Million raw observations),which provide a time resolved view on an individual's disease and medication history. A specific contribution of this work is an integration of these data with information from 14 biomedical sources (DisGeNET, TTD, KEGG, Wiki Pathways, DrugBank, SIDER, Gene Ontology, Human Protein Atlas, ...) to capture putative biological effects of observed diseases and applied medications. In consequence we extracted >165,000 features describing the longitudinal patient journey of >10,000 adult epilepsy patients. We used maximum-relevance-minimum-redundancy feature selection in combination with Random Survival Forests (RSF) for predicting the risk of 9 major comorbidities after first epilepsy diagnosis with high cross-validated C-indices of 76 - 89% and analyzed the influence of medications on the risk to develop specific comorbidities. Altogether we see our work as a first step towards earlier detection and better prevention of common comorbidities of epilepsy patients.
Supplementary material: https://drive.google.com/file/d/0B4OhgVPeWvGTeUNFQVJLai1HRlk/view?usp=sharing , code: https://github.com/thomasmooon/GCB2017
Author Comment
This is an article which has been accepted for the GCB 2017 Conference. It was peer-reviewed by a conference program committee.