Deep learning for predicting disease status using genomic data

Qianfan Wu; Adel Boueiz; Alican Bozkurt; Arya Masoomi; Allan Wang; Dawn L DeMeo; Scott T Weiss; Weiliang Qiu

doi:10.7287/peerj.preprints.27123v1

Deep learning for predicting disease status using genomic data

Qianfan Wu¹, Adel Boueiz^2,3, Alican Bozkurt⁴, Arya Masoomi⁴, Allan Wang⁵, Dawn L DeMeo², Scott T Weiss², Weiliang Qiu ²

1 Questrom School of Business, Boston University, Boston, USA

2 Brigham and Women's Hospital/Harvard Medical School, Boston, USA

3 Pulmonary and Critical Care Division, Brigham and Women's Hospital/Harvard Medical School, Boston, USA

4 Department of Computer Science, Northeastern University, Boston, USA

5 Belmont High School, Boston, USA

DOI: 10.7287/peerj.preprints.27123v1

Published: 2018-08-16
Accepted: 2018-08-16

Subject Areas: Bioinformatics, Computational Biology, Genomics, Data Mining and Machine Learning, Data Science
Keywords: Artificial Neural Networks, autoencoder, low-dimensional representation, disease prediction, gene transcripts, next generation sequencing

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Wu Q, Boueiz A, Bozkurt A, Masoomi A, Wang A, DeMeo DL, Weiss ST, Qiu W. 2018. Deep learning for predicting disease status using genomic data. PeerJ Preprints 6:e27123v1 https://doi.org/10.7287/peerj.preprints.27123v1

Abstract

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.

Author Comment

This is a submission to PeerJ for review.

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article