Haplotype based genetic risk estimation for complex diseases

Sorbonne Universités, UPMC Univ Paris 06, CNRS, Paris, France
INSERM U1169, Université Paris Sud (Paris XI), Kremlin-Bicêtre, France
DOI
10.7287/peerj.preprints.2074v1
Subject Areas
Bioinformatics, Computational Biology, Genomics, Epidemiology, Statistics
Keywords
GWAS, Machine learning, Random Forest, Haplotype, Phase information, Heritability, Genetic risk
Copyright
© 2016 Balazard
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Balazard F. 2016. Haplotype based genetic risk estimation for complex diseases. PeerJ Preprints 4:e2074v1

Abstract

Genome-wide association studies (GWAS) have uncovered thousands of associations between genetic variants and diseases. Using the same datasets, prediction of disease risk can be attempted. Phase information is an important biological structure that has seldom been used in that setting. We propose here a multi-step machine learning method that aims at using this information. Our method captures local interactions in short haplotypes and combines the results linearly. We show that it outperforms standard linear models on some GWAS datasets. However, a variation of our method that does not use phase information obtains similar performance. Regarding the missing heritability problem, we remark that interactions in short haplotypes contribute to additive heritability. Source code is available on github at https://github.com/FelBalazard/Prediction-with-Haplotypes.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Results for different values of Nb, Nleaf and window size on CD dataset

DOI: 10.7287/peerj.preprints.2074v1/supp-1

Results for PH, PwoH and PHd on all datasets for different values of window size

DOI: 10.7287/peerj.preprints.2074v1/supp-2

Results for lasso with pre-selection depending on N the number of SNPs pre-selected

DOI: 10.7287/peerj.preprints.2074v1/supp-3