Associating disease-related genetic variants in intergenic regions to the genes they impact

Geoff Macintyre; Antonio Jimeno Yepes; Cheng Soon Ong; Karin Verspoor

doi:10.7287/peerj.preprints.507v1

Associating disease-related genetic variants in intergenic regions to the genes they impact

Geoff Macintyre^1,2, Antonio Jimeno Yepes¹, Cheng Soon Ong^3,4,5, Karin Verspoor ^1,6

1 Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia

2 Centre for Neural Engineering, The University of Melbourne, Melbourne, Australia

3 Department of Computer Science, Australian National University, Canberra, Australia

4 Machine Learning Research Group, NICTA Canberra Research Laboratory, Canberra, Australia

5 Department of Electrical and Electronic Engineering, The University of Melbourne, Melbourne, Australia

6 Health and Biomedical Informatics Centre, The University of Melbourne, Australia

DOI: 10.7287/peerj.preprints.507v1

Published: 2014-09-21
Accepted: 2014-09-21

Subject Areas: Bioinformatics, Computational Biology, Genomics, Computational Science
Keywords: text mining, eQTL, HiC, non-coding variants, data integration

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Macintyre G, Jimeno Yepes A, Ong CS, Verspoor K. 2014. Associating disease-related genetic variants in intergenic regions to the genes they impact. PeerJ PrePrints 2:e507v1 https://doi.org/10.7287/peerj.preprints.507v1

Abstract

We present a method to assist in interpretation of the functional impact of intergenic disease-associated SNPs that is not limited to search strategies proximal to the SNP. The method builds on two sources of external knowledge: the growing understanding of three-dimensional spatial relationships in the genome, and the substantial repository of information about relationships among genetic variants, genes, and diseases captured in the published biomedical literature. We integrate chromatin conformation capture data (HiC) with literature support to rank putative target genes of intergenic disease-associated SNPs. We demonstrate that this hybrid method outperforms a genomic distance baseline on a small test set of expression quantitative trait loci, as well as either method individually. In addition, we show the potential for this method to uncover relationships between intergenic SNPs and target genes across chromosomes. With more extensive chromatin conformation capture data becoming readily available, this method provides a way forward towards functional interpretation of SNPs in the context of the three dimensional structure of the genome in the nucleus.

Author Comment

This is a submission to PeerJ for review.