Genetic variations in the drug metabolizing enzyme, CYP2E1, among various ethnic populations of Pakistan

Genetic polymorphism in cytochrome P450 (CYP) monooxygenase genes is an important source of interindividual variability of drug response. CYP enzyme activities may change as a result of such polymorphisms which then, may affect drug metabolism. This would result in a change in the severity and frequency of adverse effects in addition to the non-responder phenomenon. CYP2E1, a member of CYP superfamily, affects the metabolism of several clinically important drugs such as halothane, paracetamol, etc. Genetic variation in CYP2E1 is known to cause significant inter-individual differences in drug response and adverse effects. The degree of genetic variation is found to be different in different populations around the world. The frequencies of two important polymorphisms in the CYP2E1*7C, NC_000010.10:g.135340548A>G (rs2070672) and CYP2E1, NC_000010.10:g.135339244G>C (rs3813865), are not known in the Pakistani population. In the present investigation, 636 healthy human volunteers were screened for these two single nucleotide polymorphism. Our results indicate that about 18% (rs2070672) and 28% (rs3813865) of the Pakistani population has a genotype containing at least one low activity allele. A significant interethnic variation in the frequencies of both the polymorphisms was observed. These results suggest that pharmacogenetics screening for low activity genotypes would be a helpful tool for clinicians when they prescribe medications metabolized by CYP2E1, as a significant fraction of the Pakistani population is expected to have a variable response to these drugs.

Selected polymorphisms in this gene are associated with a higher risk of certain diseases such as cancer (Barry et al., 2011;Zhang et al., 2009). For example, single nucleotide polymorphisms, CYP2E1 Ã 7C, NC_000010.10:g.135340548A>G (rs2070672) and CYP2E1, NC_000010.10:g.135339244G>C (rs3813865), are significantly associated with high altitude polycythemia risk (Xu et al., 2015), with nasopharyngeal carcinoma risk in Cantonese (Jia et al., 2009), with poorer cancer-specific survival in head & neck cancer (Hakenewerth et al., 2013). These polymorphisms are also associated with inter-individual differences in drug response and adverse effects, especially the liver injury with ethanol, and the frequency of these polymorphisms varies in different ethnic groups around the world (Bonifaz-Pena et al., 2014;Huang et al., 2012;Kim et al., 2015). As the frequencies of genetic polymorphisms are associated with interethnic differences in drug response and add significant risk for certain diseases, it is an extremely important subject to investigate. That is why frequencies of these polymorphisms in CYP2E1 have been described for various populations.
Pakistan is a culturally diverse country, but little is known about the distribution of CYP2E1 genetic polymorphism in this country of over 200 million people. Various parts of the country possess a unique lifestyle, diverse genetic background, dietary habits, culture, and geographical environment. There are more than 100 single nucleotide polymorphisms found in CYP2E1 in addition to some copy number variants. Among them, rs2031920, rs3813867, rs6413432, rs6413420, rs72559710, rs55897648, rs2070673, rs3813865 and rs2070672 are well known. However, only a few might alter the enzyme activity or associated with certain diseases. Therefore, we specifically investigated samples drawn from six of Pakistan's most populous ethnic groups located in distinct geographical locations and found out frequencies of two important polymorphisms (rs3813865 and rs2070672) and then compared them with previous findings in other populations.

Sample collection and DNA extraction
This study was approved by the Institutional Review Board and Ethics Committee of Shifa Tameer-e-Millat University, Islamabad, Pakistan (ref: IRB# 990-265-2018). Written informed consent forms were obtained from all participating individuals. The study cohort comprised of 636 healthy human individuals from six major ethnicities of Pakistan including Punjabis, Pathan, Sindhi, Balochi, Seraiki, and Urdu Speaking. Ethnicity was self-reported. Five ml of venous blood drawn into sterile tubes containing EDTA as an anti-coagulant was stored at 4 C. The whole-genome DNA was isolated using Gene Jet Genomic DNA extraction Kit (Thermo Fischer, Waltham, MA, USA) and was quantified using 1% agarose gel electrophoresis. Isolated genomic DNA was stored at −20 C until further processing.
Genotyping CYP2E1 (rs3813865, G>C; rs2070672, A>G) were genotyped using Allele Refractory Mutation System-Polymerase Chain Reaction (ARMS-PCR) using a pair of outer primers and a pair of inner primers as shown in Table 1. PCR for both the SNPs was performed separately in a total reaction volume of 25 µl containing 12.5 µl of 2X Dream Taq Mastermix (Thermo Fischer, Waltham, MA, USA), 0.18 mM of both OF and OR primers, 0.36 mM of both IF and IR primers, 7.7 µl of sterile PCR water and three µl of template DNA (20-50 ng/ml). Thermal profile was as follows: initial denaturation at 95 C for 2 min followed by 35 cycles with denaturation at 95 C for 30 s, 30 s of primer annealing at 54 C for rs3813865 and 58 C for rs2070672, initial extension at 72 C for 1 min and a final extension at 72 C for 2 min. For visualization 12 µl of PCR product was directly loaded onto 3% agarose gel. For rs3813865, homozygous wild type GG genotype had 499 bp and 303 bp fragments, homozygous mutant CC genotype had 499 bp and 236 bp fragments and heterozygous GC genotype had three fragments; 499 bp, 303 bp, and 236 bp. Whereas for rs2070672 homozygous wild type GG genotype had 455 bp and 277 bp fragments, homozygous mutant AA genotype had 455 bp and 218 bp fragments and heterozygous GA genotype had three fragments; 455 bp, 277 bp, and 218 bp. Selected samples were sent for Sanger sequencing, and the results obtained were found to conform with results from our laboratory.

Statistical analysis
Data were compiled according to the genotype and allele frequencies estimated from the observed numbers of each specific allele. The frequency of each allele and genotype in our samples is given together with the 95% confidence interval. The confidence interval for proportions was calculated using the formula (CI = p ± (1.96 × SE), SE = qrt [p(1 − p)/n], p = proportion, n = sample size). Chi-squared test and p values were calculated using observed and expected frequencies as per the Hardy-Weinberg equation.

RESULTS
Frequencies of CYP2E1 (rs2070672) alleles in the Pakistani population are shown in Table 2 while representative agarose gel image of the experiments is shown in Fig. S1. The frequency of the major allele was 89.62% and of minor allele was 10.37%. Major allele was found slightly less prevalent in Punjabi and Urdu populations at 87.18% and 84.14% respectively compared to Pathan, Sindhi, Seraiki and Baloch populations, where the prevalence of major allele was slightly higher ( Fig. 1; Table 2). The frequency of AA genotype was 82.70%, AG was 13.83% and GG was 3.45% in the Pakistan population. Punjabi and Urdu populations showed a slightly lower frequency of wild type genotype at 79.83% and 78.04% respectively while Pathan, Sindhi, and Seraiki populations had a slightly higher prevalence of wild type genotype. Baloch population showed the highest frequency of wild type genotype at 92.68%. No homozygous GG genotype was found in Pathan, Seraiki and Baloch populations (Table 3).  Frequencies of CYP2E1 (rs3813865) alleles in the Pakistani population are shown in Table 4 while representative agarose gel image of the experiments is shown in Fig. S2. The frequency of minor alleles for this polymorphism was higher in the Pakistani population, compared to rs2070672, and found to be 14.8% (Table 4). In Punjabi and Urdu populations, the minor allele was found even more prevalent at 18.49% and 18.83% respectively. In Sindhi population, the frequency of minor allele was found to be the lowest at 1.81%. The frequency of the GG genotype was 72.47%, GC was 23.58% and CC was 3.93% in the Pakistan population. In Punjabi and Urdu populations, wild type genotype (GG) was slightly less prevalent at 63.47% and 67.53% respectively. Baloch population showed a higher frequency of wild type genotype at 83.60%. The highest prevalence of wild type genotype was found in the Sindhi population at 96.36%. Sindhi was also the only population in which no homozygous CC genotype was observed. All other ethnic groups showed CC genotype albeit at varying frequencies (Table 5).

DISCUSSION
According to its Statistics Bureau, Pakistan with an estimated population of over 210 million, is the sixth most populous country in the world (Pakistan Bureau of Statistics,

2017)
. The country has a young, multi-ethnic and multi-cultural society and despite being home to a huge population, pharmacogenetic studies on how its population responds to various pharmaceutical drugs are rare. The largest ethnic group in Pakistan is Punjabi, which makes up about 38.78% of the population, followed by Pashtuns (18.24%), Sindhis (14.57%), Seraikis (10.53%), Urdu speaking (7.57%) and Baloch (3.57%) (Pakistan Bureau of Statistics, 2017). These ethnic groups represent about 94% of the Pakistani population. Genetic variations in CYP genes affecting the metabolism of xenobiotics and drug response have not been investigated in these ethnic groups. Our study partly addresses this issue by reporting frequencies of two of the most important single nucleotide polymorphisms in the CYP2E1 gene. The frequency of rs2070672 minor allele (G) in the Pakistani population was similar to the one found in the American population (Table 6). The lowest frequency of minor allele has previously been reported from Europe (0.027). Literature search shows that in the African population, minor allele is slightly more prevalent than in the Pakistani population while East and South Asian populations have the highest frequencies of the minor allele (Genomes Project Consortium, 2015). Similar results were found for genotype frequency where the wild type genotype observed in the Pakistani population was comparable to the American population. The highest frequency of wild type genotype is reported from Europe, in which no homozygous GG genotype was found. The highest frequencies of heterozygous and homozygous GG are reported from East and South Asian populations (Genomes Project Consortium, 2015). The difference in allele and genotype frequencies between earlier reports for South Asian populations and this study may be because our study estimated frequencies in six different ethnicities while in 1,000 Genome project, the Pakistani population is represented by one ethnicity only. The difference in sample size may be another reason for discrepancy.
Among various ethnicities, Urdu speaking showed the highest frequency of rs2070672 minor allele. Punjabi ethnicity displayed the highest prevalence of the minor allele after Urdu speaking while Baloch people exhibited the lowest frequency of this allele. While comparing genotypes frequencies, Urdu speaking ethnicity showed the lowest frequency of wild type genotype (AA) followed by Punjabi ethnicity. The highest frequency of wild type genotype was exhibited by Baloch ethnicity. Pathan, Baloch and Seraiki ethnicities did not show any homozygous mutant genotype (GG) while Urdu speaking ethnicity displayed the highest frequency of this genotype among the study participants. Studies investigating this polymorphism in the regional populations reveal that the Chinese Uygur population has a low prevalence of this genetic variant at 0.25% (Zhu et al., 2018). Other studies carried out at various geographical locations in China showed much higher frequencies. For example, this variant was found at a frequency of 18.8% in Shantou, 14.1% in Shanghai, 18.8% in Shenyang and 21.9% in Xian (Tang et al., 2010). Comparison with other regional populations reveal relatively lower frequencies of the minor allele (rs207067) in Pakistani ethnicities. Sri Lankan Tamils in the UK (STU) and Bengali from Bangladesh (BEB) have shown a much higher frequency of minor allele at 0.328 and 0.315 respectively whereas, in Pakistani population, the highest frequency of this allele is reported by Punjabi and Urdu ethnicities at 0.128 and 0.158, respectively. Even Gujrati Indian in Houston (GIH), Punjabi in Lahore (PJL) and Indian Telugu in the UK (ITU) have reported minor allele frequencies that are higher than most Pakistan ethnicities. Although, largely in agreement with regional ethnicities, this relatively low frequency of minor alleles in Pakistani ethnicities might be due to broad geographical locations from which our samples were obtained. For example, PJL data in the 1,000 genome project, was obtained from one center in Lahore while our samples were collected from various centers in Rawalpindi, Islamabad, and Lahore.
The frequency of rs3813865 minor allele (C) in the Pakistani population was similar to the one found in African population. The lowest frequency of the minor allele is reported from Europe (0.026) ( Table 7). The highest frequencies of this polymorphism are reported from South and East Asia at 0.287 and 0.267 respectively (Genomes Project Consortium, 2015). The frequency of this genetic variant in the American population is reported at 0.102, which is the second-lowest frequency reported in the 1,000 genome database for this variant. Looking at the genotype frequencies, wild type genotype observed in the Pakistani population was closest to the one reported for the African population. South Asian and East Asian populations are reported to have the lowest frequencies of wild type genotype at 0.513 and 0.530 respectively (Genomes Project Consortium, 2015).
The highest frequency of wild type genotype (GG) is reported from European populations at 0.948 with a 0.52 heterozygous genotype (GC). However, European populations are the only ones reported in the 1,000 genome to have no homozygous CC genotype. Comparing rs3813865 polymorphism among various ethnicities revealed that Urdu and Punjabi ethnicities have the highest prevalence of the minor allele (C). The highest major allele frequency was exhibited by Sindhi ethnicity. Pathan, Seraiki and Baloch ethnicity showed frequency of minor allele in the same range but was considerably higher than Sindhi ethnicity. Studying genotype frequencies of various ethnicities showed that Sindhi ethnicity possesses the highest wild type genotype (GG). The lowest wild type genotype was exhibited by Punjabi ethnicity. Pathan, Seraiki, Baloch, and Urdu speaking ethnicities showed an intermediate prevalence of wild type genotype compared to Sindhi and Punjabi ethnicities. Only Sindhi ethnicity did not report any homozygous mutant genotype (CC). Consistent with our findings with rs207067, the frequency of rs3813865 minor allele was found slightly lower in our ethnic populations in comparison with other regional ethnicities. Sri Lankan Tamils in the UK (STU) and Bengali from Bangladesh (BEB) have shown a higher frequency of minor allele at 0.328 and 0.315 respectively whereas, in Pakistani population, the highest frequency of this allele is reported by Punjabi and Urdu ethnicities at 0.185 and 0.188, respectively. Even Gujrati Indian in Houston (GIH), Punjabi in Lahore (PJL) and Indian Telugu in the UK (ITU) have reported minor allele frequencies that are higher than most Pakistan ethnicities. A literature search shows that the frequency of this genetic variant at various geographical locations in China displayed comparable and, in some cases, higher frequencies. For example, this variant was found at a frequency of 18.7% in Shantou, 14% in Shanghai, 23.4% in Shenyang and 22.4% in Xian (Tang et al., 2010).
Limitations of our study include finding out the prevalence of only two SNPs while there are more than 100 SNPs found in CYP2E1. However, only a few might alter the enzyme activity or associated with certain diseases. Our methods for the determination of these SNPs also prohibited us from finding novel SNPs in our population. Sequencing all samples could have helped find new SNPs in our population. This would have also helped us finding copy number variants in the CYP2E1 gene in our population if any. Functional analysis of the CYP2E1 enzyme, containing these SNPs could have helped establish the functional relevance of observed SNPs.

CONCLUSIONS
To our knowledge, this is the first study to report frequencies of CYP2E1 gene polymorphisms in various ethnicities of the Pakistani population. Genetic information about patients' CYP2E1 gene is likely to help physicians prescribe to patients the most suitable and safest drug based on their genetic make-up. We propose further studies with individual drugs metabolized by CYP2E1 to shed more light on genotype phenotype relations. Carrying out enzyme activities of CYP2E1 containing these SNPs would be helpful to establish functional relevance and importance of these SNPs in the Pakistani population.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
Funding for this research project was provided by the Shifa Tameer-e-Millat University to Dr. Sagheer Ahmed. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors: Shifa Tameer-e-Millat University.