Transcriptome association studies of neuropsychiatric traits in African Americans implicate PRMT7 in schizophrenia

Peter N. Fiorica; Heather E. Wheeler

doi:10.7717/peerj.7778

Transcriptome association studies of neuropsychiatric traits in African Americans implicate PRMT7 in schizophrenia

Peter N. Fiorica^1,2, Heather E. Wheeler ^2,3,4,5

1Department of Chemistry and Biochemistry, Loyola University Chicago, Chicago, IL, USA

2Department of Biology, Loyola University Chicago, Chicago, IL, USA

3Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA

4Department of Computer Science, Loyola University Chicago, Chicago, IL, USA

5Department of Public Health Sciences, Loyola University Chicago, Maywood, IL, USA

DOI: 10.7717/peerj.7778

Published: 2019-09-26
Accepted: 2019-08-27
Received: 2019-05-13

Academic Editor: Hossein Khiabanian

Subject Areas: Genetics, Genomics, Psychiatry and Psychology
Keywords: GWAS, PrediXcan, Population genetics, Gene expression, Schizophrenia, Bipolar disorder

Copyright: © 2019 Fiorica and Wheeler
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Fiorica PN, Wheeler HE. 2019. Transcriptome association studies of neuropsychiatric traits in African Americans implicate PRMT7 in schizophrenia. PeerJ 7:e7778 https://doi.org/10.7717/peerj.7778

The authors have chosen to make the review history of this article public.

Abstract

In the past 15 years, genome-wide association studies (GWAS) have provided novel insight into the genetic architecture of various complex traits; however, this insight has been primarily focused on populations of European descent. This emphasis on European populations has led to individuals of recent African descent being grossly underrepresented in the study of genetics. With African Americans making up less than 2% of participants in neuropsychiatric GWAS, this discrepancy is magnified in diseases such as schizophrenia and bipolar disorder. In this study, we performed GWAS and the gene-based association method PrediXcan for schizophrenia (n = 2,256) and bipolar disorder (n = 1,019) in African American cohorts. In our PrediXcan analyses, we identified PRMT7 (P = 5.5 × 10⁻⁶, local false sign rate = 0.12) as significantly associated with schizophrenia following an adaptive shrinkage multiple testing adjustment. This association with schizophrenia was confirmed in the much larger, predominantly European, Psychiatric Genomics Consortium. In addition to the PRMT7 association with schizophrenia, we identified rs10168049 (P = 1.0 × 10⁻⁶) as a potential candidate locus for bipolar disorder with highly divergent allele frequencies across populations, highlighting the need for diversity in genetic studies.

Introduction

Individuals of recent African ancestry have been grossly underrepresented in genomic studies. African American participants make up about 2.0% of all genome-wide association studies (GWAS) subjects (Sirugo, Williams & Tishkoff, 2019). Specifically, individuals of African ancestry make up only 1.2% of all neuropsychiatric GWAS (Quansah & McGregor, 2018). With the advent of polygenic risk scores, accuracy in disease prediction is critical to the development of precision medicine (Khera et al., 2018); however, the lack of representative diversity in the study of genomics has impacted the accuracy of genetic risk prediction across diverse populations. Despite similar incidences of schizophrenia across European and African ancestry populations (De Candia et al., 2013; Whiteford et al., 2013), Africans have been predicted to have significantly less disease risk than their European counterparts using current GWAS summary statistics (Martin et al., 2017). Inaccuracy in predicting disease risk across populations can lead to further disparities in health and treatment of underrepresented populations. To prevent misclassification of genetic risk, further work in the genetics underlying complex traits in African Americans is needed (Manrai et al., 2016). In an attempt to address this discrepancy in genetic risk prediction, we performed a series of genetic association tests for schizophrenia and bipolar disorder in two cohorts of African American individuals (Manolio et al., 2007; Suarez et al., 2006; Smith et al., 2009).

Schizophrenia and bipolar disorder are two heritable neuropsychiatric disorders whose genetic components have been attributed to the cumulative effect of thousands of loci across the genome (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014; Li et al., 2017; Ikeda et al., 2017b). Past work shows that the genetic architectures of these two disorders significantly overlap (Bhalala et al., 2018; Allardyce et al., 2018; The International Schizophrenia Consortium, 2009; Stahl et al., 2019). Up to this point, the largest GWAS of schizophrenia and bipolar disorder comprise hundreds of thousands of individuals primarily of European descent (Stahl et al., 2019; Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014; Li et al., 2017). While studies of neuropsychiatric diseases in European and Asian ancestry populations continue to grow, the scarcity of studies in African American populations persists (Ikeda et al., 2017a).

To date, one of the largest GWAS of schizophrenia in an African American population was completed by the Genetic Association Information Network (GAIN) (Manolio et al., 2007); however, this study found no single nucleotide polymorphisms (SNPs) to be genome-wide significant and offered little insight into the potential function of genes in schizophrenia in African Americans. The GAIN has also performed one of the largest GWAS of bipolar disorder in African Americans (Manolio et al., 2007; Smith et al., 2009). Similar to the findings of the GAIN GWAS of schizophrenia, Smith et al. (2009) found no SNPs significantly associated with bipolar disorder. In addition to a traditional GWAS using a logistic regression, we performed two gene-level association tests: PrediXcan and MultiXcan (Gamazon et al., 2015; Barbeira et al., 2019). PrediXcan offers a series of advantages to SNP-level analyses in detecting genetic association and functionality. First, since tests are being conducted at the gene level, PrediXcan has a lower multiple-testing burden compared to GWAS. Additionally, our understanding of functional pathways are more easily constructed for genes compared to SNPs. By using gene expression as an intermediate phenotype between genetic variation and complex phenotypes, PrediXcan results can help elucidate genetic mechanisms compared to GWAS. We completed our genetic association tests for schizophrenia in a cohort of 2,256 self-identified African American individuals from the genome-wide linkage scan of African American families (Suarez et al., 2006) and the GAIN (Manolio et al., 2007). For our study of bipolar disorder, we performed these association tests in 1,019 African American individuals from the GAIN. Using these data, we identified one gene significantly associated with schizophrenia and tested it for replication in the Psychiatric Genomics Consortium (PGC) GWAS of schizophrenia (Barbeira et al., 2018; Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014; Psychiatric GWAS Consortium Bipolar Disorder Working Group, 2011).

Methods

Cohorts

The individuals in the cohorts used in this study were all self-identified African Americans. We acquired genotype and phenotype information for these individuals from the National Center for Biotechnology Information database of Genotypes and Phenotypes (dbGaP). The project was confirmed exempt from human subjects federal regulations under exemption number 4 by the Loyola University Chicago Institutional Review Board (project number 2014). Whole Genome genotypes and phenotypic information were acquired from three separate accessions in dbGaP (Table 1). All genotype information for these three accessions were acquired using Affymetrix Genome-Wide Human SNP Array 6.0, covering 934,940 SNPs. In total, our studies included 2,256 and 1,019 individuals for schizophrenia and bipolar disorder, respectively.

Table 1:

Cohort characteristics.

Three separate cohorts were integrated into two main cohorts characterized by phenotype. After merging the two schizophrenia cohorts using PLINK, the number of post-QC SNPs became identical.

dbGaP accession number	phs000021.v3.p2	phs000167.v1.p1	phs000017.v3.p1
Phenotype	Schizophrenia	Schizophrenia	Bipolar disorder
Total individuals	2,220	120	1,045
Cases	1,241	15	359
Controls	979	105	686
Post-QC individuals	2,256	2,256	1,019
Pre-QC SNPs	845,814	909,622	867,411
Post-QC SNPs	742,015	742,015	721,050
Post-imputation SNPs (r² > 0.8, MAF > 0.01)	12,780,487	12,780,487	12,799,548

DOI: 10.7717/peerj.7778/table-1

Case-control criteria for both phenotypes were determined using the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders) as described previously (Smith et al., 2009). Following the update from DSM-IV to DSM-V, 40 individuals previously identified as cases under DSM-IV were removed from the study because they no longer met the case criteria for DSM-V.

Quality control and imputation

Following the download of data from dbGaP, we isolated genotypes and phenotypes of African Americans from those of European Americans or individuals of unidentified ethnicities for each cohort. We merged the PLINK binary files from the two schizophrenia studies. At this point 906,425 SNPs were genotyped in 2,256 individuals in the schizophrenia cohort and 867,411 SNPs and 1,019 individuals in the bipolar disorder cohort. While the schizophrenia cohort included individuals from GAIN and the genome-wide linkage scan of African American families (Suarez et al., 2006; Manolio et al., 2007), throughout the rest of the paper, we will refer to the combined cohort as GAIN. In each cohort, we removed SNPs with genotyping call rates less than 99% and those that significantly deviated from Hardy–Weinberg equilibrium (P < 1× 10⁻⁶). We then removed individuals with excess heterozygosity. Individuals greater than three standard deviations from mean heterozygosity were removed from the study. We used EIGENSOFT smartpca (Patterson, Price & Reich, 2006) to generate the first 10 principal components, which were used to confirm self-identified ancestry (Figs. S1 and S2). After this quality control, we had 742,015 SNPs and 2,256 individuals total in the schizophrenia cohort and 721,050 SNPs and 1,019 individuals in the bipolar disorder cohort (Purcell et al., 2007).

From here, the filtered data from each cohort were uploaded to the University of Michigan Imputation Server for genotype imputation (Das et al., 2016). The genotypes for each cohort were imputed using Eagle version 2.3 for phasing and 1000 Genomes Phase 3 version 5 (1000G) as our reference panel (The 1000 Genomes Project Consortium, 2015). After this, we downloaded the imputed data from the Michigan Imputation Server and converted it to PLINK binary format. We then filtered the data by removing SNPs with imputation r² < 0.8 and minor allele frequency (MAF) <0.01. At this point, we were left with 12,780,487 and 12,799,548 SNPs in the schizophrenia and bipolar disorder cohorts, respectively. We explored Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) as an alternative reference panel for imputation, but 1000G imputed more SNPs meeting our filters while simultaneously imputing SNPs with MAFs identical to those imputed from CAAPA at our chosen imputation r² and MAF thresholds (Fig. S3) (Mathias et al., 2016).

Genome-wide association study

Using PLINK, we performed a logistic regression of the phenotype using the first 10 genotypic principal components as covariates to account for population structure. We used a significance threshold of P < 5 × 10⁻⁸ to identify significantly associated SNPs. Plots were generated from PLINK results using the web-based tool LocusZoom (Pruim et al., 2011).

PrediXcan

We performed the gene-based association test PrediXcan on both phenotypes, schizophrenia and bipolar disorder, in this study. PrediXcan functions by predicting an individual’s genetically regulated gene expression levels using tissue-dependent prediction models trained using reference transcriptome data (Gamazon et al., 2015). For our experiments we tested each phenotype across 55 prediction models. Forty-eight of these models were trained on 48 tissues in GTEx version 7 (Barbeira et al., 2018). Six of these models were generated from monocyte transcriptomes of individuals in the Multi-Ethnic Study of Atherosclerosis (MESA) cohort. The MESA models, the most diverse set of published predictors to date, were built from genotypes and transcriptomes of self-identified African American, Hispanic, and European individuals (Mogil et al., 2018). These models can be found at http://predictdb.org/. We also used a model built from dorsolateral prefrontal cortex (DLPFC) data from the CommonMind Consortium (Huckins et al., 2019). To impute the gene expression levels, the PLINK binary files from each cohort had to be converted to PrediXcan dosage files. To do this, we used the conversion script provided at https://github.com/hakyimlab/PrediXcan/tree/master/Software. After predicting a genetically regulated level of expression, we tested each expression level for association with the phenotype of interest. Since PrediXcan does not have a flag for performing a logistic regression with covariates, we performed a logistic regression of the phenotype with the first 10 principal components to generate a residual phenotype in order to account for population structure. We then performed a linear regression with the residual phenotype and gene expression level for each gene.

Following the PrediXcan association tests, we adjusted for multiple testing using the adaptive shrinkage approach implemented in the R package ashr (Stephens, 2017). Using this package, we calculated the local false sign rate (lfsr) for each test, which is similar to traditional false discovery rate approaches, but takes into account both the effect sizes and standard errors of each gene-tissue pair (n = 248,605). In addition, this empirical Bayes approach uses the assumption that the distribution of actual effects is unimodal with the mode at 0. We set our significance threshold for gene-tissue pairs at lfsr < 0.2.

Due to the dearth of African American neuropsychiatric cohorts, replication could not be completed in an independent African American cohort. To validate our results, we compared our findings to the association results of a meta-analysis of the PGC GWAS summary statistics completed using S-PrediXcan (Barbeira et al., 2018; Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014; Psychiatric GWAS Consortium Bipolar Disorder Working Group, 2011). We performed S-PrediXcan in both phenotypes using predictors from GTEx. The Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014) study originally included 36,989 cases and 113,075 controls composed of about 96.5% European and 3.5% Asian individuals. The Psychiatric GWAS Consortium Bipolar Disorder Working Group (2011) study included 7,481 cases and 9,250 controls of primarily European and Asian descent.

MultiXcan

Following imputation of gene expression levels, we performed MultiXcan (Barbeira et al., 2019), a gene-based association test that combines information across multiple tissues while taking their correlation into account. Using the predicted expression levels in 48 tissues across GTEx, we performed MultiXcan on both of the disease phenotypes.

Results

Schizophrenia gene-based association study

To better understand the genetic architecture of schizophrenia in African Americans, we performed transcriptome-wide association studies using prediction models built in 55 tissues. In the GAIN cohort of 2,256 individuals (969 controls and 1,287 cases), we predicted gene expression across 48 tissues in GTEx, six models built from monocytes across MESA, and DLPFC from CommonMind (Barbeira et al., 2018; Wheeler et al., 2016; Mogil et al., 2018; Huckins et al., 2019).

PRMT7 was the most significantly associated gene with an lfsr of 0.119 and a P-value of 5.49 × 10⁻⁶ in the atrial appendage of the heart (Table 2; Figs. 1 and 2). Increased predicted expression of PRMT7 associated with schizophrenia in 32 of 33 tissues in GTEx tested (Figs. 2 and 3). Effect sizes were also positive for PRMT7 associations with schizophrenia in 42 of 42 tissues tested in our S-PrediXcan application to the PGC data (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014) (Fig. 3). Of the 42 tissues tested, 30 associations were statistically significant (P < 0.0012) after Bonferroni adjustment for the number of tissues tested and all 42 tissues had P < 0.05 (Table S1). We found no significant gene-tissue associations using the MESA or DLPFC models. While PRMT7 in atrial appendage had the lowest lfsr across all models, RP11-646C24.5 had a lower P-value (Fig. 1), but high lfsr in both pancreas (lfsr = 0.860) and sigmoid colon (lfsr = 0.851). Notably, the standard error in both of these tissues was over twice the size of that of PRMT7. Unlike more traditional false discovery rate approaches such as Bejamini–Hochberg, both effect size and standard error are used in an empirical Bayesian framework to calculate lfsr and thus the gene with the lowest P-value may not be the gene with the lowest lfsr (Stephens, 2017). We found no significant associations with the MultiXcan, cross-tissue model.

Table 2:

Top PrediXcan results for schizophrenia in African Americans (GAIN cohort) by local false sign rate (lfsr).

PRMT7 makes up five of the top eight associated gene-tissue pairs by lfsr. BP are reported as transcription start site for the respective genes.

Gene	Beta	t	P	se(beta)	Tissue (predictor)	lfsr	CHR	BP
PRMT7	0.10	4.56	5.49 × 10⁻⁶	0.022	Heart_Atrial_Appendage	0.119	16	68392457
PRMT7	0.09	4.22	2.50 × 10⁻⁵	0.020	Cells_Transformed_fibroblasts	0.225	16	68392457
PRMT7	0.08	4.00	6.63 × 10⁻⁵	0.019	Heart_Left_Ventricle	0.353	16	68392457
PRMT7	0.11	4.23	2.47 × 10⁻⁵	0.025	Adrenal_Gland	0.365	16	68392457
TBC1D2	0.12	4.24	2.34 × 10⁻⁵	0.028	Brain_Putamen_basal_ganglia	0.468	9	100961311
NPC1	0.07	3.83	1.29 × 10⁻⁴	0.019	Thyroid	0.484	18	21086148
EIF2S2P3	0.08	3.81	1.41 × 10⁻⁴	0.022	Brain_Amygdala	0.558	10	94428502
PRMT7	0.08	3.73	1.97 × 10⁻⁴	0.020	Cells_EBV-transformed_lymphocytes	0.578	16	68392457

DOI: 10.7717/peerj.7778/table-2

Figure 1: PrediXcan association results for schizophrenia in GAIN African Americans.
Each point on the Manhattan (A) and Quantile-Quantile (B) plots represents one gene-tissue test for association with schizophrenia using GTEx version 7 gene expression prediction models. *PRMT7* expression in atrial appendage of the heart is labeled in both plots since it had the lowest lfsr of all tissues (lfsr = 0.119). Predicted *RP11-646C24.5* expression in pancreas and sigmoid colon associations are represented as the two points with lower P-values than *PRMT7*, respectively, but lfsr was greater than 0.8 for each association. Unlike more traditional false discovery rate approaches such as Bejamini–Hochberg, the gene with the lowest P-value may not be the gene with the lowest lfsr especially if the standard error of the effect size estimate is high (Stephens, 2017).

Download full-size image

DOI: 10.7717/peerj.7778/fig-1

Predicted PRMT7 expression is higher in schizophrenia cases than controls in GAIN. — Figure 2: Predicted *PRMT7* expression is higher in schizophrenia cases than controls in GAIN.
The violin plot represents the differences in density of predicted gene expression levels of *PRMT7* between cases (SCZ) and controls in heart atrial appendage from GTEx (P = 5.49 × 10⁻⁶).

Download full-size image

DOI: 10.7717/peerj.7778/fig-2

PRMT7 PrediXcan discovery (GAIN) and replication (PGC) results across tissue models. — Figure 3: *PRMT7* PrediXcan discovery (GAIN) and replication (PGC) results across tissue models.
In each bubble plot, the radius of the bubble is representative of the significance of *PRMT7* association with SCZ. The color of the bubble represents the test statistic with blue representing a positive direction of effect and red representing a negative direction of effect.

Download full-size image

DOI: 10.7717/peerj.7778/fig-3

Bipolar disorder gene-based association study

To develop a better understanding of the genetic mechanisms governing bipolar disorder in African Americans, we performed PrediXcan in a cohort of 1,019 individuals (671 controls and 348 Cases). Similar to our gene-based association study of schizophrenia, we performed our tests across the same 55 gene expression prediction models in our bipolar disorder study.

In the GAIN cohort of 1,019 African American individuals, no genes were identified to be significantly associated with bipolar disorder. Increased predicted expression of GREM2 in testis was the most associated (P = 2.20 × 10⁻⁵) gene-tissue pair with bipolar disorder (Fig. 4). KCNMB3 had the lowest lfsr at 0.919. We also found no significant associations with the MultiXcan, cross-tissue model.

Figure 4: PrediXcan association results for bipolar disorder in GAIN.
Each point on the Manhattan (A) and Quantile-Quantile (B) plots represents a gene-tissue association test for our study of bipolar disorder using GTEx models. *GREM2* on chromosome 1 was the gene most associated with bipolar disorder in our study. All of the gene associations tests across 48 tissues in GTEx are plotted in (A) and (B). A total of 95% confidence intervals depicted by gold dotted lines (B).

Download full-size image

DOI: 10.7717/peerj.7778/fig-4

Schizophrenia SNP-level association test

We performed a GWAS across greater than 12 million SNPs following imputation to help elucidate the role specific SNPs play in the genetics of schizophrenia in African Americans. We used the first 10 principal components as covariates for our logistic regression in order to adjust for population stratification in the cohort. In our SNP-level GWAS, we found no significantly associated SNPs; however, one of the most associated SNPs, rs8063446 (P = 2.66 × 10⁻⁶), is located at the PRMT7 locus (Fig. 5) While not genome-wide significant, the most associated SNP in our study was rs112845369 (P = 1.094 × 10⁻⁶) on chromosome 15.

LocusZoom plot of the PRMT7 locus in the GAIN GWAS for schizophrenia. — Figure 5: LocusZoom plot of the *PRMT7* locus in the GAIN GWAS for schizophrenia.
rs8063446 is found in *SLC7A6OS* and 514 bp upstream of *PRMT7*. In our PrediXcan analyses, we found that increased predicted expression of *PRMT7* is associated with schizophrenia. rs8063446 is located in a linkage disequilibrium (LD) block with other SNPs associated with schizophrenia when plotted using 1000G AFR LD Population.

Download full-size image

DOI: 10.7717/peerj.7778/fig-5

Bipolar disorder SNP-level association test

We also performed a logistic GWAS in over 12 million SNPs in an attempt to understand the role specific SNPs play in the genetics of bipolar disorder. We similarly used the first 10 principal components to adjust for population stratification in the bipolar disorder cohort. Similar to our findings in our schizophrenia GWAS, we identified no SNPs significantly associated with bipolar disorder after adjusting for multiple tests. rs10168049 on chromosome 2 was the most associated SNP (P = 1.04 × 10⁻⁶). While not significantly associated with bipolar disorder, rs10168049 has a MAF of 0.465 in African 1000G populations compared to those of European and East Asian populations with minor allele frequencies of 0.042 and 0.097, respectively (Fig. 6) (Sherry, 2001; The 1000 Genomes Project Consortium, 2015). All PrediXcan and GWAS summary statistics for both diseases are available at https://github.com/WheelerLab/Neuropsychiatric-Phenotypes.

Figure 6: rs10168049 frequency across 1000G populations.
Representative minor allele frequencies (MAFs) of rs10168049, which associated with bipolar disorder in GAIN (P = 1.04 × 10⁻⁶) in different populations from 1000G. The global MAF of this SNP in 1000G is 0.187; however, the MAF reaches up to 0.541 in the YRI (Yoruba people in Ibadan, Nigeria) population of 1000G. This figure was generated using the Geography of Genetic Variants Browser (Marcus & Novembre, 2017).

Download full-size image

DOI: 10.7717/peerj.7778/fig-6

Discussion

We performed gene-level (PrediXcan) and SNP-level association studies for schizophrenia and bipolar disorder in African Americans from GAIN (Suarez et al., 2006; Manolio et al., 2007). We used summary statistics from the predominantly European PGC to replicate our findings (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014; Psychiatric GWAS Consortium Bipolar Disorder Working Group, 2011).

A potential role for PRMT7 in schizophrenia

PRMT7 was significantly associated with schizophrenia in our PrediXcan analyses and one of the most associated SNPs in our GWAS is just upstream of the gene (Figs. 1 and 5). Schizophrenia was associated with increased expression of PRMT7 in 32 of 33 tissues in which it was predicted in the GAIN cohort. When S-PrediXcan was applied to the PGC summary stats, increased expression of PRMT7 was associated with schizophrenia in all 42 tissues in which expression was predicted (Fig. 3). PRMT7 made up five of the eight most associated gene-tissue pairs (Table 2). PRMT7 has previously been associated with SBIDDS syndrome, an intellectual disability syndrome (Agolini et al., 2018). Its function in the disorder remains unclear, but PRMT7 has a functional role in neuronal differentiation, which could be a potential mechanism to explore further (Dhar et al., 2012).

While not found to be significantly associated with schizophrenia in brain tissues, the association of PRMT7 in adrenal gland and other vascular system organs highlights the sharing of eQTLs across tissues. Recently, genes in both colon and adrenal gland were identified to be significantly associated with schizophrenia (Gamazon et al., 2019). Gamazon et al. highlight the opportunity to better understand the genetic mechanisms of neuropsychiatric diseases outside of the context of the central nervous system. A larger sample size would be needed to elucidate expression correlations and potential co-expression networks underlying African American neuropsychiatric traits.

Growing need for diversity in GWAS and genetic prediction models

In our GWAS of bipolar disorder, rs10168049 was the most significantly associated SNP. This SNP has not been implicated in previous studies, and its higher MAF in African populations compared to European and Asian populations demonstrates how GWAS in African ancestry populations may tag key loci missed in European-only studies (Fig. 6). In addition to not being identified in any published GWAS, rs10168049 is not present in any of the 55 prediction models we used to impute gene expression levels (MacArthur et al., 2017). As a result, this SNP did not contribute to predicted expression levels in our gene-based association tests. The association of this SNP with bipolar disorder needs to be replicated in larger studies; however, the lack of associations in other GWAS in the GWAS Catalog (MacArthur et al., 2017) may also be a result of ascertainment bias due to the field’s focus on European populations (Lachance & Tishkoff, 2014).

Version 7 of the GTEx predictors we used contain data exclusively from individuals of European descent. These models are not optimal for predicting expression in African American cohorts (Mogil et al., 2018; Mikhaylova & Thornton, 2019). While they offer power driven by sample size, they do not include models with African ancestry-specific alleles that might affect susceptibility to neuropsychiatric traits in this population.

The use of prediction models from monocytes in MESA offers advantages with respect to similar ancestry, but at the loss of nearly half of the sample size of many GTEx tissues. To ideally predict expression in African American cohorts, prediction models built in more tissues from African ancestry reference transcriptomes are needed. Moreover, future ancestry-specific models will not only increase accuracy of expression prediction, but they will also create opportunities for different methods, such as local ancestry mapping, to be applied to expression prediction by accounting for recent admixture within African American cohorts (Zhong, Perera & Gamazon, 2019).

Conclusion

Information from this study provides promising insight into the genetic architecture of gene expression underlying two neuropsychiatric disorders in African Americans. The results of our study were curbed by the small sample size of GAIN. With just over 2,100 individuals in our study of schizophrenia and 1,000 individuals in our study bipolar disorder, our findings were limited in power compared to larger European studies nearly two orders of magnitude greater in size (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014; Stahl et al., 2019).

The size and diversity of our prediction models further hindered our ability to identify novel genes associated with these disorders. The MESA models, the most diverse of our predictors, were still limited in tissue type (monocytes only) and size at 233 African American individuals, 352 Hispanic individuals, and 578 European individuals (Mogil et al., 2018). The GTEx and CommonMind predictors we used, while generated from a larger sample size than the MESA African American cohort for many tissues, were made from exclusively European individuals (GTEx Consortium, 2017; Huckins et al., 2019). These key limitations highlight the need to increase the number of individuals from diverse populations in the study of neuropsychiatric genomics. To best characterize the molecular mechanisms that govern complex traits in diverse populations, diverse models, reference panels, and study subjects need to be included in genomics research.

Supplemental Information

Principal component analysis of schizophrenia genotype data.

We performed principal component analysis on the GAIN cohort merged with three populations from version three of the HapMap Project. Each point on the plot represents one individual in the study plotted across axes for their first and second principal components. The three HapMap populations plotted are Chinese in Beijing and Japanese in Tokyo (ASN), European ancestry in Utah (CEU), and Yoruba people in Ibadan, Nigeria (YRI).

DOI: 10.7717/peerj.7778/supp-1

Download

Principal component analysis of bipolar disorder genotype data.

DOI: 10.7717/peerj.7778/supp-2

Download

Comparison of minor allele frequencies (MAFs) across imputation reference panels.

We imputed genotypes using the University of Michigan Imputation Server using either 1000G or CAAPA as the reference panel. (A-D) depict the MAFs of SNPs from the GAIN schizophrenia study. We saw a similar pattern of MAFs in the GAIN data of the bipolar disorder study. (A) depicts the MAF of SNPs at the intersection of each reference panel before filtering by r² > 0.8 and MAF > 0.01. (B) Depicts MAFs of SNPs in 1000G and CAAPA from (A) that passed the filters of r² > 0.8 and MAF > 0.01and were included in the GTEx prediction models across 44 tissues. (C) shows a plot of the MAFs of filtered SNPs from 1000G and CAAPA found in the MESA predictors. (D) shows a plot of the MAFs of filtered SNPs from 1000G and CAAPA that were included in our GWAS.

DOI: 10.7717/peerj.7778/supp-3

Download

S-PrediXcan results of PGC schizophrenia data for PRMT7.

The table includes the results for PRMT7 in our S-PrediXcan application to the PGC GWAS summary statistics across 42 tissues in which gene expression was predicted.

DOI: 10.7717/peerj.7778/supp-4

Download

[1] Agolini E, Dentici ML, Bellacchio E, Alesi V, Radio FC, Torella A, Musacchia F, Tartaglia M, Dallapiccola B, Nigro V, Digilio MC, Novelli A. 2018. Expanding the clinical and molecular spectrum of PRMT7 mutations: 3 additional patients and review. Clinical Genetics 93(3):675-681

[2] Allardyce J, Leonenko G, Hamshere M, Pardiñas AF, Forty L, Knott S, Gordon-Smith K, Porteous DJ, Haywood C, Di Florio A, Jones L, McIntosh AM, Owen MJ, Holmans P, Walters JTR, Craddock N, Jones I, O’Donovan MC, Escott-Price V. 2018. Association between schizophrenia-related polygenic liability and the occurrence and level of mood-incongruent psychotic symptoms in bipolar disorder. JAMA Psychiatry 75(1):28-35

[3] Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, Torstenson ES, Shah KP, Garcia T, Edwards TL, Stahl EA, Huckins LM, Consortium G, Nicolae DL, Cox NJ, Im HK. 2018. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nature Communications 9(1):1825

[4] Barbeira AN, Pividori MD, Zheng J, Wheeler HE, Nicolae DL, Im HK. 2019. Integrating predicted transcriptome from multiple tissues improves association detection. PLOS Genetics 15(1):e1007889

[5] Bhalala OG, Nath AP, UK Brain Expression Consortium, Inouye M, Sibley CR. 2018. Identification of expression quantitative trait loci associated with schizophrenia and affective disorders in normal brain tissue. PLOS Genetics 14(8):e1007607

[6] Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh P-R, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C. 2016. Next-generation genotype imputation service and methods. Nature Genetics 48(10):1284-1287

[7] De Candia TR, Lee SH, Yang J, Browning BL, Gejman PV, Levinson DF, Mowry BJ, Hewitt JK, Goddard ME, O’Donovan MC, Purcell SM, Posthuma D, Visscher PM, Wray NR, Keller MC. 2013. Additive genetic variation in schizophrenia risk is shared by populations of African and European descent. American Journal of Human Genetics 93(3):463-470

[8] Dhar SS, Lee SH, Kan PY, Voigt P, Ma L, Shi X, Reinberg D, Lee MG. 2012. Trans-tail regulation of MLL₄-catalyzed H₃K₄ methylation by H₄R₃ symmetric dimethylation is mediated by a tandem PHD of MLL₄. Genes & Development 26(24):2749-2762

[9] Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, Nicolae DL, Cox NJ, Im HK. 2015. A gene-based association method for mapping traits using reference transcriptome data. Nature Genetics 47(9):1091-1098

[10] Gamazon ER, Zwinderman AH, Cox NJ, Denys D, Derks EM. 2019. Multi-tissue transcriptome analyses identify genetic mechanisms underlying neuropsychiatric traits. Nature Genetics 51(6):933-940

[11] GTEx Consortium. 2017. Genetic effects on gene expression across human tissues. Nature 550(7675):204-213

[12] Huckins LM, Dobbyn A, Ruderfer DM, Hoffman G, Wang W, Pardinas AF, Rajagopal VM, Als TD, Hoang HT, Girdhar K, Boocock J, Roussos P, Fromer M, Kramer R, Domenici E, Gamazon E, Purcell SM, Demontis D, Borglum AD, Walters J, O’Donovan M, Sullivan P, Owen M, Devlin B, Sieberts SK, Cox N, Im HK, Sklar P, Stahl EA, CommonMind Consortium, the Schizophrenia Working Group of the Psychiatric Genomics Consortium, iPSYCH-GEMS Schizophrenia Working Group. 2019. Gene expression imputation across multiple brain regions reveals schizophrenia risk throughout development. Nature Genetics 51:659-674

[13] Ikeda M, Saito T, Kondo K, Iwata N. 2017a. Genome-wide association studies of bipolar disorder: a systematic review of recent findings and their clinical implications. Psychiatry and Clinical Neurosciences 72(2):52-63

[14] Ikeda M, Takahashi A, Kamatani Y, Okahisa Y, Kunugi H, Mori N, Sasaki T, Ohmori T, Okamoto Y, Kawasaki H, Shimodera S, Kato T, Yoneda H, Yoshimura R, Iyo M, Matsuda K, Akiyama M, Ashikawa K, Kashiwase K, Tokunaga K, Kondo K, Saito T, Shimasaki A, Kawase K, Kitajima T, Matsuo K, Itokawa M, Someya T, Inada T, Hashimoto R, Inoue T, Akiyama K, Tanii H, Arai H, Kanba S, Ozaki N, Kusumi I, Yoshikawa T, Kubo M, Iwata N. 2017b. A genome-wide association study identifies two novel susceptibility loci and trans population polygenicity associated with bipolar disorder. Molecular Psychiatry 23(3):639-647

[15] Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, Natarajan P, Lander ES, Lubitz SA, Ellinor PT, Kathiresan S. 2018. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics 50(9):1219-1224

[16] Lachance J, Tishkoff SA. 2014. SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it. BioEssays 35(9):780-786

[17] Li Z, Chen J, Yu H, He L, Xu Y, Zhang D, Yi Q, Li C, Li X, Shen J, Song Z, Ji W, Wang M, Zhou J, Chen B, Liu Y, Wang J, Wamg P, Yang P, Wang Q, Feng G, Liu B, Sun W, Li B, He G, Li W, Wan C, Xu Q, Li W, Wen Z, Liu K, Huang F, Li J, Ripke S, Yue W, Sullivan PF, O’Donovan MC, Shi Y. 2017. Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia. Nature Genetics 49(11):1576-1583

[18] MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, Junkins H, McMahon A, Milano A, Morales J, Pendlington ZM, Welter D, Burdett T, Hindorff L, Flicek P, Cunningham F, Parkinson H. 2017. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) Nucleic Acids Research 45(D1):D896-D901

[19] Manolio TA, Rodriguez LL, Brooks L, Abecasis G, Ballinger D, Daly M, Donnelly P, Faraone SV, Frazer K, Gabriel S, Gejman P, Guttmacher A, Harris EL, Insel T, Kelsoe JR, Lander E, McCowin N, Mailman MD, Nabel E, Ostell J, Pugh E, Sherry S, Sullivan PF, Thompson JF, Warram J, Wholley D, Milos PM, Collins FS. 2007. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nature Genetics 39(9):1045-1051

[20] Manrai AK, Funke BH, Rehm HL, Olesen MS, Maron BA, Szolovits P, Margulies DM, Loscalzo J, Kohane IS. 2016. Genetic misdiagnoses and the potential for health disparities. New England Journal of Medicine 375(7):655-665

[21] Marcus JH, Novembre J. 2017. Visualizing the geography of genetic variants. Bioinformatics 33(4):594-595

[22] Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE. 2017. Human demographic history impacts genetic risk prediction across diverse populations. American Journal of Human Genetics 100(4):635-649

[23] Mathias RA, Taub MA, Gignoux CR, Fu W, Musharoff S, O’Connor TD, Vergara C, Torgerson DG, Pino-Yanes M, Shringarpure SS, Huang L, Rafaels N, Boorgula MP, Johnston HR, Ortega VE, Levin AM, Song W, Torres R, Padhukasahasram B, Eng C, Mejia-Mejia D-A, Ferguson T, Qin ZS, Scott AF, Yazdanbakhsh M, Wilson JG, Marrugo J, Lange LA, Kumar R, Avila PC, Williams LK, Watson H, Ware LB, Olopade C, Olopade O, Oliveira R, Ober C, Nicolae DL, Meyers D, Mayorga A, Knight-Madden J, Hartert T, Hansel NN, Foreman MG, Ford JG, Faruque MU, Dunston GM, Caraballo L, Burchard EG, Bleecker E, Araujo MI, Herrera-Paz EF, Gietzen K, Grus WE, Bamshad M, Bustamante CD, Kenny EE, Hernandez RD, Beaty TH, Ruczinski I, Akey J, Barnes KC. 2016. A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nature Communications 7(1):12522

[24] Mikhaylova AV, Thornton TA. 2019. Accuracy of gene expression prediction from genotype data with PrediXcan varies across and within continental populations. Frontiers in Genetics 10:261

[25] Mogil LS, Andaleon A, Badalamenti A, Dickinson SP, Guo X, Rotter JI, Johnson WC, Im HK, Liu Y, Wheeler HE. 2018. Genetic architecture of gene expression traits across diverse populations. PLOS Genetics 14(8):e1007586

[26] Patterson N, Price AL, Reich D. 2006. Population structure and eigenanalysis. PLOS Genetics 2(12):e190

[27] Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, Boehnke M, Abecasis GR, Willer CJ, Frishman D. 2011. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26(18):2336-2337

[28] Psychiatric GWAS Consortium Bipolar Disorder Working Group. 2011. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nature Genetics 43(10):977-983

[29] Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, De Bakker PIW, Daly MJ, Sham PC. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics 81(3):559-575

[30] Quansah E, McGregor NW. 2018. Towards diversity in genomics: the emergence of neurogenomics in Africa? Genomics 110(1):1-9

[31] Schizophrenia Working Group of the Psychiatric Genomics Consortium. 2014. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511(7510):421-427

[32] Sherry ST. 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29(1):308-311

[33] Sirugo G, Williams SM, Tishkoff SA. 2019. The missing diversity in human genetic studies. Cell 177(1):26-31

[34] Smith EN, Bloss CS, Badner JA, Barrett T, Belmonte PL, Berrettini W, Byerley W, Coryell W, Craig D, Edenberg HJ, Eskin E, Foroud T, Gershon E, Greenwood TA, Hipolito M, Koller DL, Lawson WB, Liu C, Lohoff F, McInnis MG, McMahon FJ, Mirel DB, Murray SS, Nievergelt C, Nurnberger J, Nwulia EA, Paschall J, Potash JB, Rice J, Schulze TG, Scheftner W, Panganiban C, Zaitlen N, Zandi PP, Zöllner S, Schork NJ, Kelsoe JR. 2009. Genome-wide association study of bipolar disorder in European American and African American individuals. Molecular Psychiatry 14(8):755-763

[36] Stephens M. 2017. False discovery rates: a new deal. Biostatistics 18(2):275-294

[37] Suarez BK, Duan J, Sanders AR, Hinrichs AL, Jin CH, Hou C, Buccola NG, Hale N, Weilbaecher AN, Nertney DA, Olincy A, Green S, Schaffer AW, Smith CJ, Hannah DE, Rice JP, Cox NJ, Martinez M, Mowry BJ, Amin F, Silverman JM, Black DW, Byerley WF, Crowe RR, Freedman R, Cloninger CR, Levinson DF, Gejman PV. 2006. Genomewide linkage scan of 409 European-ancestry and African American families with schizophrenia: suggestive evidence of linkage at 8p23.3-p21.2 and 11p13.1-q14.1 in the combined sample. American Journal of Human Genetics 78(2):315-333