Associating disease-related genetic variants in intergenic regions to the genes they impact

Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
Centre for Neural Engineering, The University of Melbourne, Melbourne, Australia
Department of Computer Science, Australian National University, Canberra, Australia
Machine Learning Research Group, NICTA Canberra Research Laboratory, Canberra, Australia
Department of Electrical and Electronic Engineering, The University of Melbourne, Melbourne, Australia
Health and Biomedical Informatics Centre, The University of Melbourne, Australia
DOI
10.7287/peerj.preprints.507v1
Subject Areas
Bioinformatics, Computational Biology, Genomics, Computational Science
Keywords
text mining, eQTL, HiC, non-coding variants, data integration
Copyright
© 2014 Macintyre et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Macintyre G, Jimeno Yepes A, Ong CS, Verspoor K. 2014. Associating disease-related genetic variants in intergenic regions to the genes they impact. PeerJ PrePrints 2:e507v1

Abstract

We present a method to assist in interpretation of the functional impact of intergenic disease-associated SNPs that is not limited to search strategies proximal to the SNP. The method builds on two sources of external knowledge: the growing understanding of three-dimensional spatial relationships in the genome, and the substantial repository of information about relationships among genetic variants, genes, and diseases captured in the published biomedical literature. We integrate chromatin conformation capture data (HiC) with literature support to rank putative target genes of intergenic disease-associated SNPs. We demonstrate that this hybrid method outperforms a genomic distance baseline on a small test set of expression quantitative trait loci, as well as either method individually. In addition, we show the potential for this method to uncover relationships between intergenic SNPs and target genes across chromosomes. With more extensive chromatin conformation capture data becoming readily available, this method provides a way forward towards functional interpretation of SNPs in the context of the three dimensional structure of the genome in the nucleus.

Author Comment

This is a submission to PeerJ for review.