Predicting comorbidities of epilepsy patients using big data from Electronic Health Records combined with biomedical knowledge

UCB Biosciences GmbH, Monheim, Germany
Life Science Data Analytics & Algorithmic Bioinformatics, Rheinische Friedrich-Wilhelms Universität Bonn, Bonn, Germany
UCB Ltd., Raleigh, USA
DOI
10.7287/peerj.preprints.3228v1
Subject Areas
Bioinformatics, Data Mining and Machine Learning, Data Science
Keywords
Electronic Health Records, Big Data, Machine Learning, Data Mining
Copyright
© 2017 Gerlach et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Gerlach T, Lu C, Fröhlich H. 2017. Predicting comorbidities of epilepsy patients using big data from Electronic Health Records combined with biomedical knowledge. PeerJ Preprints 5:e3228v1

Abstract

Epilepsy is a complex brain disorder characterized by repetitive seizure events. Epilepsy patients often suffer from various and severe physical and psychological comorbidities. While general comorbidity prevalence and incidences can be estimated from epidemiological data, such an approach does not take into account that actual patient specific risks can depend on various individual factors, including medication. This motivates to develop a machine learning approach for predicting individual comorbidities. To address these needs we used Big Data from electronic health records (~100 Million raw observations),which provide a time resolved view on an individual's disease and medication history. A specific contribution of this work is an integration of these data with information from 14 biomedical sources (DisGeNET, TTD, KEGG, Wiki Pathways, DrugBank, SIDER, Gene Ontology, Human Protein Atlas, ...) to capture putative biological effects of observed diseases and applied medications. In consequence we extracted >165,000 features describing the longitudinal patient journey of >10,000 adult epilepsy patients. We used maximum-relevance-minimum-redundancy feature selection in combination with Random Survival Forests (RSF) for predicting the risk of 9 major comorbidities after first epilepsy diagnosis with high cross-validated C-indices of 76 - 89% and analyzed the influence of medications on the risk to develop specific comorbidities. Altogether we see our work as a first step towards earlier detection and better prevention of common comorbidities of epilepsy patients.

Supplementary material: https://drive.google.com/file/d/0B4OhgVPeWvGTeUNFQVJLai1HRlk/view?usp=sharing , code: https://github.com/thomasmooon/GCB2017

Author Comment

This is an article which has been accepted for the GCB 2017 Conference. It was peer-reviewed by a conference program committee.

Supplemental Information

Supplementary Text

Supplementary Text

DOI: 10.7287/peerj.preprints.3228v1/supp-1