Background Identifying genetic interactions in data obtained from genome-wide association studies (GWASs) can help in understanding the genetic basis of complex diseases. The large number of single nucleotide polymorphisms (SNPs) in GWASs however makes the identification of genetic interactions computationally challenging. We developed the Bayesian Combinatorial Method (BCM) that can identify pairs of SNPs that in combination have high statistical association with disease. Results We applied BCM to two late-onset Alzheimer’s disease (LOAD) GWAS datasets to identify SNP-SNP interactions between a set of known SNP associations and the dataset SNPs. For evaluation we compared our results with those from logistic regression, as implemented in PLINK. Gene Ontology analysis of genes from the top 200 dataset SNPs for both GWAS datasets showed overrepresentation of LOAD-related terms. Four genes were common to both datasets: APOE and APOC1, which have well established associations with LOAD, and CAMK1D and FBXL13, not previously linked to LOAD but having evidence of involvement in LOAD. Supporting evidence was also found for additional genes from the top 30 dataset SNPs. Conclusion BCM performed well in identifying several SNPs having evidence of involvement in the pathogenesis of LOAD that would not have been identified by univariate analysis due to small main effect. These results provide support for applying BCM to identify potential genetic variants such as SNPs from high dimensional GWAS datasets.
This is version 2. Changes to the test include:-in the Abstract, a clarification about the method being used for comparison (lines 32-33, line 252, "we compared our results with those from logistic regression, as implemented in PLINK"),- a more detailed explanation of how the two APOE SNPs determine the APOE protein polymorphism (lines 143-147).
MF: GO molecular function; CC: GO cellular compartment; BP: biological process, No: number of genes from the list that have the relevant annotation. . Italicized entries represent the redundant terms; they are placed under their most informative common ancestor (in normal font).
MF: GO molecular function; CC: GO cellular compartment; BP: biological process, No: number of genes from the list that have the relevant annotation. . Italicized entries represent the redundant terms; they are placed under their most informative common ancestor (in normal font).
The authors have no competing interests to declare.
The following grant information was disclosed by the authors:
SV was supported in part by NLM grant HHSN276201000030C, and M. Ilyas Kamboh was supported by National Institutes of Health grants AG030653 and AG005133.