Haplotype based genetic risk estimation for complex diseases
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Genomics, Epidemiology, Statistics
- Keywords
- GWAS, Machine learning, Random Forest, Haplotype, Phase information, Heritability, Genetic risk
- Copyright
- © 2016 Balazard
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2016. Haplotype based genetic risk estimation for complex diseases. PeerJ Preprints 4:e2074v1 https://doi.org/10.7287/peerj.preprints.2074v1
Abstract
Genome-wide association studies (GWAS) have uncovered thousands of associations between genetic variants and diseases. Using the same datasets, prediction of disease risk can be attempted. Phase information is an important biological structure that has seldom been used in that setting. We propose here a multi-step machine learning method that aims at using this information. Our method captures local interactions in short haplotypes and combines the results linearly. We show that it outperforms standard linear models on some GWAS datasets. However, a variation of our method that does not use phase information obtains similar performance. Regarding the missing heritability problem, we remark that interactions in short haplotypes contribute to additive heritability. Source code is available on github at https://github.com/FelBalazard/Prediction-with-Haplotypes.
Author Comment
This is a submission to PeerJ for review.