Integrated probability of coronary heart disease subject to the −308 tumor necrosis factor-alpha SNP: a Bayesian meta-analysis
- Published
- Accepted
- Received
- Academic Editor
- Antonio Palazón-Bru
- Subject Areas
- Bioinformatics, Cardiology, Statistics
- Keywords
- Bayesian meta-analysis, Coronary heart disease, −308 TNF-alpha
- Copyright
- © 2015 Carvalho
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Integrated probability of coronary heart disease subject to the −308 tumor necrosis factor-alpha SNP: a Bayesian meta-analysis. PeerJ 3:e1236 https://doi.org/10.7717/peerj.1236
Abstract
We present a meta-analysis of independent studies on the potential implication in the occurrence of coronary heart disease (CHD) of the single-nucleotide polymorphism (SNP) at the −308 position of the tumor necrosis factor alpha (TNF-alpha) gene. We use Bayesian analysis to integrate independent data sets and to infer statistically robust measurements of correlation. Bayesian hypothesis testing indicates that there is no preference for the hypothesis that the −308 TNF-alpha SNP is related to the occurrence of CHD, in the Caucasian or in the Asian population, over the null hypothesis. As a measure of correlation, we use the probability of occurrence of CHD conditional on the presence of the SNP, derived as the posterior probability of the Bayesian meta-analysis. The conditional probability indicates that CHD is not more likely to occur when the SNP is present, which suggests that the −308 TNF-alpha SNP is not implicated in the occurrence of CHD.
Introduction
Coronary heart disease (CHD) is now widely accepted to consist of a chronic inflammatory disease (Hansson, 2005). CHD is a complex disease with multifold etiology, with both genetic and environmental factors contributing to its occurrence and development.
Among the genetic factors potentially implicated in the emergence of CHD, the tumor necrosis factor alpha (TNF-α) has attracted a great interest for its involvement in the inflammatory response of the immune system (Vassali, 1992). There is evidence that TNF-α is implicated in an increased susceptibility to the pathogenesis of a variety of diseases. In particular, high serum levels of TNF-α affect endothelial cell hemostatic function and hence may modify the risk for developing CHD (Plutzky, 2001). There is also the suggestion that the TNF-α gene affects the modulation of lipid metabolism, obesity susceptibility and insulin resistance, thus being potentially implicated in the development of CHD (see Vourvouhaki & Dedoussis, 2008 and references therein).
Among the several single-nucleotide polymorphisms (SNPs) that have been identified in the human TNF-α, the best documented one is at the position −308 of the TNF-α gene promoter. This SNP involves the substitution of guanine (G) for adenine (A) and the subsequent creation of two alleles (TNF1(A) and TNF2(G)) and three genotypes (GG, GA and AA) (Wilson et al., 1992). It has been hypothesised that the TNF-α SNP could change the susceptibility to CHD. However, the results on its association with CHD are contradictory, some implying different influence of the two alleles on the prevalence of CHD, others implying no association (see Zhang et al., 2011 and references therein).
In order to infer the risk of CHD derived from potential risk factors, it is important to develop a formalism that infers correlations among different intervening factors and combines independent data sets for a consistent inference of the correlations. In Vourvouhaki & Carvalho (2011) we introduced a formalism based on Bayesian inference to infer the correlation of the occurrence of CHD with two risk factors and tested a simplistic model for the signal pathway on the three-variable data set from Vendrell et al. (2003). In this manuscript, we extend the formalism to extract information from the combination of data from independent studies and to quantify the combined risk of occurrence of CHD from the −308 TNF-α SNP.
The most exhaustive meta-analysis to date on this correlation is the frequentist analysis in Zhang et al. (2011) covering Caucasian, Asian, Indian and African populations. This meta-analysis found a 1.5 fold increased risk of developing CHD when the SNP is present in the Caucasian population, but found no association in the other ethnicities. A more recent meta-analysis, covering the same data sets, found no association in the Caucasian or in the Asian population (Chu et al., 2013).
In this manuscript we propose a meta-analysis based on Bayesian analysis in an attempt to establish the potential implication of −308 TNF-α SNP in the occurrence of CHD. This manuscript is organized as follows. In ‘Methods’ we describe the method. In particular, in ‘Data selection’ we describe the data sets selected; in ‘Hypotheses testing’ we propose two hypotheses and test which best and most simply describes the data. In ‘Results’ we perform the Bayesian analysis of the selected data sets, combined by ethnicity and CHD phenotype, and present the results. In particular, in ‘Inference of conditional probabilities’ we infer the conditional probabilities for the occurrence of CHD given the presence of the SNP; in ‘Sensitivity of the results’ we test the sensitivity of this formalism to low-significance data sets, to data sets with extreme results and to extreme data sets. Finally in ‘Conclusions’ we draw the conclusions. Below there follows a flow chart describing summarily the reasoning of this meta-analysis (Figs. 1, 2 and 3).
Methods
Data selection
This analysis is based on twenty data sets (indexed i) on two CHD phenotypes (indexed j) selected from the studies compiled in Zhang et al. (2011), following a well-documented study identification, data acquisition and selection strategy, including also statistical tests (Hardy–Weinberg equilibrium, heterogeneity, publication bias). The selected data sets are the studies that report the genotypes of both CHD patients and non-CHD (control) patients for the two CHD phenotypes separately. In particular, there were included: fifteen data sets from studies on Caucasians, where six studies are on the CHD phenotype coronary stenosis (CS) (Allen et al., 2001; Elahi et al., 2008; Georges et al., 2003; Sbarsi et al., 2007; Szalai et al., 2002; Vendrell et al., 2003) and nine studies are on the CHD phenotype myocardial infarction (MI) (Antonicelli et al., 2005; Bennet et al., 2006; Dedoussis et al., 2005; Herrmann et al., 1998; Koch et al., 2001; Padovani et al., 2000; Tobin et al., 2004; Tulyakova et al., 2004); and five data sets from studies on Asians on the CHD phenotype coronary stenosis (Hou et al., 2009; Liu et al., 2009; Shun et al., 2009; Li et al., 2003; Chen et al., 2001). The rejected data sets are: three studies on Caucasians (for not reporting data on non-CHD patients); four studies on Asians (three for not reporting data on non-CHD patients and one for not separating the CHD phenotypes); the study on Indians and the study on Africans (both for not separating the CHD phenotypes).
Study | Phenotype | CHD patients | Controls | Bayes factor | |||
---|---|---|---|---|---|---|---|
(i) | (j) | GG | GA/AA | GG | GA/AA | $\left({H}_{1}^{i,j}/{H}_{0}^{i,j}\right)$ | $\left({H}_{1}^{j}/{H}_{0}^{j}\right)$ |
Allen et al. (NA) | Cauc CS | 127 | 53 | 222 | 107 | 0.14 ± 0.05 | |
Elahi et al. (A) | 59 | 38 | 41 | 54 | 3.54 ± 1.12 | 0.049 ± 0.014 | |
Georges et al. (A) | 613 | 236 | 222 | 92 | 0.08 ± 0.03 | 0.041 ± 0.016^{*} | |
Sbarsi et al. (A) | 175 | 73 | 185 | 56 | 0.33 ± 0.11 | ||
Szalai et al. (A) | 229 | 89 | 181 | 87 | 0.19 ± 0.07 | 0.048 ± 0.019^{**} | |
Vendrell et al. (A) | 231 | 110 | 159 | 48 | 1.33 ± 0.46 | ||
Antonicelli et al. (A) | Cauc MI | 224 | 69 | 246 | 64 | 0.12 ± 0.04 | |
Bennet et al. (A) | 799 | 368 | 1,037 | 460 | 0.05 ± 0.02 | ||
Dedoussis et al. (A) | 206 | 31 | 227 | 10 | 26.14 ± 8.56 | 0.026 ± 0.011 | |
Herrmann et al. ^{a} (NA) | 325 | 120 | 376 | 158 | 0.11 ± 0.04 | ||
Herrmann et al. ^{b} (NA) | 117 | 79 | 97 | 79 | 0.19 ± 0.06 | 0.035 ± 0.015^{*} | |
Koch et al. (NA) | 565 | 228 | 244 | 96 | 0.07 ± 0.03 | 0.030 ± 0.012^{**} | |
Padovani et al. (A) | 120 | 28 | 114 | 34 | 0.17 ± 0.06 | ||
Tobin et al. (A) | 365 | 182 | 337 | 168 | 0.07 ± 0.03 | ||
Tulyakova et al. (NA) | 242 | 64 | 177 | 69 | 0.60 ± 0.21 | ||
Chen et al. (NA) | Asian CS | 29 | 11 | 21 | 9 | 0.27 ± 0.08 | 0.151 ± 0.057 |
Hou et al. (NA) | 268 | 32 | 802 | 103 | 0.05 ± 0.02 | ||
Li et al. (NA) | 66 | 8 | 138 | 20 | 0.12 ± 0.04 | 0.114 ± 0.043^{*} | |
Li et al. (A) | 234 | 52 | 142 | 34 | 0.10 ± 0.03 | 0.103 ± 0.037^{**} | |
Shun et al. (A) | 54 | 19 | 118 | 20 | 1.10 ± 0.34 |
Notes:
The data consist of frequencies of occurrence of the −308 TNF-α SNP in randomly selected CHD patients and non-CHD patients, respectively n_{SNP,CHD} and ${n}_{\text{SNP},\overline{\text{CHD}}}$. The data are summarized in Table 1 (columns 3–6). The errors indicated were computed from error propagation. Assuming that the methods for measuring the presence of the SNP have a success rate of r_{suc} = 0.88 (Sharifian, 2010), and furthermore that the error of a counting result is given by the Poisson approximation $\sqrt{n}$, then the error of a counting result n on the presence of the SNP is given by $\left(1-{r}_{\mathrm{suc}}\right)\sqrt{n}/2.$
Data heterogeneity
In order to investigate the heterogeneity in the data sets, we compare the size of the effect (defined as a measure of the difference between CHD and non-CHD patients) in each study (Walker, Hernandez & Kattan, 2008). As a measure of the size of the effect, we use the fraction of SNP in the population of CHD patients and in the population of non-CHD patients, respectively f_{SNPinCHD} = n_{SNP,CHD}/n_{CHD} and ${f}_{\text{SNPin}\overline{\text{CHD}}}={n}_{\text{SNP},\overline{\text{CHD}}}/{n}_{\overline{\text{CHD}}}$, where ${n}_{\text{CHD}}={n}_{\text{SNP},\text{CHD}}+{n}_{\overline{\text{SNP}},\mathrm{CHD}}$ is the total number of CHD patients and ${n}_{\overline{\text{CHD}}}={n}_{\mathrm{SNP},\overline{\text{CHD}}}+{n}_{\overline{\text{SNP}},\overline{\text{CHD}}}$ is the total number of non-CHD patients. Moreover, the ratio of these two fractions gives an indication of the correlation sign. Hence, if ${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}>1$, the SNP is proportionally more frequent in CHD than in non-CHD patients, hence the study favours a positive correlation between the presence of the SNP and the occurrence of CHD; if ${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}<1$, the SNP is proportionally less frequent in CHD than in non-CHD patients, hence the study favours a negative correlation; if ${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}=1$, the SNP is equally frequent in CHD and in non-CHD patients, hence the study favours no correlation.
We plot this ratio of fractions for each study, grouped by ethnicity and CHD phenotype, in Fig. 4. We also plot the ratio for the combined data sets included in each panel. We observe that the ratio of the data sets are asymmetrical distributed about the ratio equal to one, showing a predominance of ratios smaller than one. The ratio of the combined data sets included in each panel is slightly smaller than one for the Caucasian studies (for both CHD phenotypes) and larger than one for the Asian studies. This asymmetry indicates heterogeneity in the studies, as also observed in the meta-analysis of Zhang et al. (2011).
In Fig. 5A, we plot this ratio of fractions as a function of the sample size. We observe that smaller data sets are distributed across a wide range of values of this ratio, whereas larger data sets are distributed more closely to one. This implies that smaller data sets favour either positive or negative correlation, whereas larger data sets favour no correlation.
Hypotheses testing
First we test the hypothesis H_{1} that the presence of TNF-α SNP is related to the occurrence of CHD against the null hypothesis H_{0} that the presence of the SNP is unrelated to the occurrence of CHD. By the Bayes theorem, the probability of a hypothesis H_{i} given the data D_{SNP} is the posterior probability of the corresponding hypothesis (1)$P\left({H}_{i}|{D}_{\text{SNP}}\right)=\frac{P\left({D}_{\text{SNP}}|{H}_{i}\right)P\left({H}_{i}\right)}{P\left({D}_{\text{SNP}}\right)},$ where P(D_{SNP}|H_{i}) is the evidence, P(H_{i}) is the prior probability of H_{i} and $P\left({D}_{\text{SNP}}\right)=\sum _{n}P\left({D}_{\text{SNP}}|{H}_{i}\right)P\left({H}_{i}\right).$ The subscript in D_{SNP} reminds us that the random variable is the occurrence of the SNP. In order to infer which hypothesis is more likely in view of the data, we compare the evidence computed for the two hypotheses. The evidence is the integral of the likelihood over the j-dimensional parameter space p_{i,j} of the hypothesis H_{i}(2)$P\left({D}_{\text{SNP}}|{H}_{i}\right)=\int {d}^{j}{p}_{i,j}\phantom{\rule{0ex}{0ex}}P\left({D}_{\text{SNP}}|{p}_{i,j},{H}_{i}\right)P\left({p}_{k,n}|{H}_{i}\right).$ Assuming equal prior probabilities for the two hypotheses, then from Eq. (1) it follows that (3)$\frac{P\left({H}_{1}|{D}_{\text{SNP}}\right)}{P\left({H}_{0}|{D}_{\text{SNP}}\right)}=\frac{P\left({D}_{\text{SNP}}|{H}_{1}\right)}{P\left({D}_{\text{SNP}}|{H}_{0}\right)}.$ We compute the evidence of the two hypotheses, for each data set separately and for the combined data sets grouped by CHD phenotype. We follow the procedure detailed in Vourvouhaki & Carvalho (2011), which we here summarize for one data set and then generalize for the combined data sets. In all cases, we choose a uniform distribution for the prior of the parameters, which is justified by the absence of an a priori bias on the values of the parameters (MacKay, 2003).
The evidence of H_{0}, P(D_{SNP}|H_{0}), is computed assuming that the presence of the SNP is described by a binomial distribution with one parameter only, namely the probability p_{0} that the SNP occurs in a given population. For n_{SNP} occurrences of the SNP and ${n}_{\overline{\text{SNP}}}$ non-occurrences of the SNP in a sample of size $n={n}_{\text{SNP}}+{n}_{\overline{\text{SNP}}}$, the likelihood P(D_{SNP}|p_{0}, H_{0}) is given by (4)$P\left({D}_{\text{SNP}}|{p}_{0},{H}_{0}\right)={p}_{0}^{{n}_{\text{SNP}}}{\left(1-{p}_{0}\right)}^{{n}_{\overline{\text{SNP}}}}.$ Moreover, assuming a uniform prior distribution for p_{0}, P(p_{0}) = 1, we find that (5)$P\left({D}_{\text{SNP}}|{H}_{0}\right)={\int}_{0}^{1}d{p}_{0}\phantom{\rule{0ex}{0ex}}P\left({D}_{\text{SNP}}|{p}_{0},{H}_{0}\right)P\left({p}_{0}|{H}_{0}\right)=\frac{{n}_{\text{SNP}}!\phantom{\rule{0ex}{0ex}}{n}_{\overline{\text{SNP}}}!}{\left({n}_{\text{SNP}}+{n}_{\overline{\text{SNP}}}+1\right)!},$ where n! stands for the factorial of n.
The evidence of H_{1}, P(D_{SNP}|H_{1}), is computed assuming that the presence of the SNP is described by a binomial distribution with two parameters, namely the probability p_{1,CHD} that the SNP occurs in the subset of CHD patients and the probability ${p}_{1,\overline{\text{CHD}}}$ that the SNP occurs in the subset of non-CHD patients, (6)$P\left({D}_{\text{SNP}}|{H}_{1}\right)={\int}_{0}^{1}d{p}_{1,\mathrm{CHD}}{\int}_{0}^{1}d{p}_{1,\overline{\text{CHD}}}P\left({D}_{\text{SNP}}|{p}_{1,\mathrm{CHD}},{p}_{1,\overline{\text{CHD}}},{H}_{1}\right)P\left({p}_{1,\mathrm{CHD}},{p}_{1,\overline{\text{CHD}}}|{H}_{1}\right).$ For n_{SNP,CHD} occurrences of the SNP and ${n}_{\overline{\text{SNP}},\text{CHD}}$ non-occurrences of the SNP in a subset of CHD patients ${n}_{\text{CHD}}={n}_{\text{SNP,CHD}}+{n}_{\overline{\text{SNP}},\text{CHD}}$, and also for ${n}_{\text{SNP},\overline{\text{CHD}}}$ occurrences of the SNP and ${n}_{\overline{\text{SNP}},\overline{\text{CHD}}}$ non-occurrences of the SNP in a subset of non-CHD patients ${n}_{\overline{\text{CHD}}}={n}_{\mathrm{SNP},\overline{\text{CHD}}}+{n}_{\overline{\text{SNP}},\overline{\text{CHD}}},$ the likelihood $P\left({D}_{\text{SNP}}|{p}_{1,\mathrm{CHD}},{p}_{1,\overline{\text{CHD}}},{H}_{1}\right)$ is separable, i.e., it can be decomposed into the product of the likelihoods P(D_{SNP}|p_{1,CHD}, H_{1}) and $P\left({D}_{\text{SNP}}|{p}_{1,\overline{\text{CHD}}},{H}_{1}\right)$, as follows (7)$P\left({D}_{\text{SNP}}|{p}_{1,\text{CHD}},{p}_{1,\overline{\text{CHD}}},{H}_{1}\right)={p}_{1,\text{CHD}}^{{n}_{\text{SNP,CHD}}}{\left(1-{p}_{1,\text{CHD}}\right)}^{{n}_{\overline{\text{SNP}},\text{CHD}}}{p}_{1,\overline{\text{CHD}}}^{{n}_{\text{SNP},\overline{\text{CHD}}}}{\left(1-{p}_{1,\overline{\text{CHD}}}\right)}^{{n}_{\overline{\text{SNP}},\overline{\text{CHD}}}}\equiv P\left({D}_{\text{SNP}}|{p}_{1,\text{CHD}},{H}_{1}\right)P\left({D}_{\text{SNP}}|{p}_{1,\overline{\text{CHD}}},{H}_{1}\right).$ Assuming a uniform probability for p_{1,CHD} and ${p}_{1,\overline{\text{CHD}}},$ $P\left({p}_{1,\text{CHD}},{p}_{1,\overline{\text{CHD}}}|{H}_{1}\right)=$1 and moreover that the priors on p_{1,CHD} and ${p}_{1,\overline{\text{CHD}}}$ are separable, the posterior distribution will also be separable and given by (8)$P\left({D}_{\text{SNP}}|{H}_{1}\right)={\int}_{0}^{1}d{p}_{1,\text{CHD}}\phantom{\rule{0ex}{0ex}}P\left({D}_{\text{SNP}}|{p}_{1,\text{CHD}},{H}_{1}\right)P\left({p}_{1,\text{CHD}}|{H}_{1}\right){\int}_{0}^{1}d{p}_{1,\overline{\text{CHD}}}\phantom{\rule{0ex}{0ex}}P\left({D}_{\text{SNP}}|{p}_{1,\overline{\text{CHD}}},{H}_{1}\right)P\left({p}_{1,\overline{\text{CHD}}}|{H}_{1}\right)=\frac{{n}_{\text{SNP,CHD}}!\phantom{\rule{0ex}{0ex}}{n}_{\overline{\text{SNP}},\text{CHD}}!}{\left({n}_{\text{SNP,CHD}}+{n}_{\overline{\text{SNP}},\text{CHD}}+1\right)!}\frac{{n}_{\text{SNP},\overline{\text{CHD}}}!\phantom{\rule{0ex}{0ex}}{n}_{\overline{\text{SNP}},\overline{\text{CHD}}}!}{\left({n}_{\text{SNP},\overline{\text{CHD}}}+{n}_{\overline{\text{SNP}},\overline{\text{CHD}}}+1\right)!}.$ In order to compare the hypotheses, we take the ratio of the corresponding evidences, B_{10} = P(H_{1}|D)/P(H_{0}|D), which we present in Table 1 (columns 7–8). This quantity is known as the Bayes factor and gives empirical levels of significance for the strength of the evidence of the test hypothesis over that of the null hypothesis. It also encapsulates the Occam’s factor, which measures the adequacy of a hypothesis to the data over the parameter space of the hypothesis (MacKay, 2003). The levels of significance ascribed to the Bayes factor are calibrated by the Jeffrey’s scale (Kass & Raftery, 1995). According to this scale, a Bayes factor larger than one indicates that H_{1} is favoured over H_{0}. Otherwise, H_{0} is favoured over H_{1}. For the data sets taken separately, the results from this hypothesis test mostly agree with the corresponding results presented in the meta-analysis by Chu et al. (2013, see Fig. 1).
We plot the Bayes factor for each study, grouped by ethnicity and CHD phenotype, in Fig. 6. For the data sets taken separately, we observe that the Bayes factor is asymmetrically distributed about the Bayes factor equal to one, with most Bayes factors being smaller than one. The exceptions are Elahi et al. (2008), Vendrell et al. (2003) and Dedoussis et al. (2005) for the Caucasian population, and Shun et al. (2009) for the Asian population. This asymmetry indicates heterogeneity in the results. For the combined data sets included in each panel, the Bayes factor takes values 0.03–0.05 for the Caucasian population and 0.15 for the Asian population, which indicates that there is no evidence for H_{1} over H_{0}. We also observe that, for the Caucasian population, the Bayes factor of the combined data sets is outside the range of variability of the Bayes factor of the data sets considered separately. This suggests that the combination of the Caucasian data sets causes a new data pattern to emerge. Conversely the combination of the Asian data sets leads to an approximately average data pattern. Hence we conclude that the data favour H_{0} over H_{1}. Since H_{0} yields trivial results, in the subsequent subsections we present the results also for H_{1} to illustrate the application of the formalism to a more general setup. It is also instructive to compare the subsequent results using both hypotheses.
In Fig. 5B, we plot the Bayes factor as a function of the sample size. We observe that smaller data sets are distributed across a wide range of values of the Bayes factor, whereas larger data sets are distributed across values smaller than one. This implies that smaller data sets favour either H_{0} or H_{1}, whereas larger data sets favour H_{0}.
Correlation sign
Comparing Fig. 6 with Fig. 4, we observe that, among the studies with Bayes factor larger than one, Elahi et al. (2008) has a ratio ${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}<1$, i.e., the SNP is proportionally less frequent in CHD than in non-CHD patients, which indicates a negative correlation between the presence SNP and the occurrence of CHD. Another example of comparatively large Bayes factor and low ratio ${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}$ is the study of Tulyakova et al. (2004). This indicates that the hypotheses as formulated do not distinguish the correlation sign.
To further explore how the ratio ${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}$ affects the result of the hypothesis testing, we consider several realizations of CHD populations with the same n_{CHD} but with different fractions of SNP. More specifically for each combined data set, we vary n_{SNP,CHD} while varying simultaneously ${n}_{\overline{\text{SNP}},\text{CHD}}$ so as to keep n_{CHD} constant. Throughout the different realizations, the size of the control population is kept equal to the size of the control population of the combined data sets grouped by ethnicity and CHD phenotype. For each realization, we compute both f_{SNPinCHD} (note that ${f}_{\text{SNPin}\overline{\text{CHD}}}$ is by construction kept fixed) and B_{10}, and plot the results in Fig. 7. The realizations with the f_{SNPinCHD} of a real combined data set are marked as red points. In Fig. 7A, we plot B_{10} as a function of ${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}$, from which there result three parabolae centred at the same point. In Fig. 7B, for a better visualization of the behaviour of B_{10}, we plot B_{10} as a function of f_{SNPinCHD}, from which there result three parabolae centred at different points. We observe that B_{10} follows a parabola, taking the minimum value when ${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}=1$ and increasing in both directions with the increase of $|{f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}-1|$, i.e., with the increase of the distance from 1. This confirms that the hypotheses as formulated do not distinguish between a positive correlation of the SNP with CHD (${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}>1$) and a negative correlation (${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}<1$). Hence, the value of ${f}_{\text{SNPinCHD}}/{f}_{\text{SNPin}\overline{\text{CHD}}}$ complements the value of B_{10} in the characterization of the correlation.
Results
Inference of conditional probabilities
Posterior probability for the occurrence of CHD
We proceed to compute the probability for the occurrence of CHD, i.e., given the data on the presence of the SNP, we determine the probability that a patient has CHD. This is defined as the posterior probability (9)$P\left(\text{CHD}|{D}_{\text{SNP}},{H}_{i}\right)=\frac{P\left(\right.{D}_{\text{SNP}}|\left(\text{CHD,}{H}_{i}\right)P\left(\text{CHD}\right)}{P\left({D}_{\text{SNP}}|{H}_{i}\right)}.$
The prior probability P(CHD) is based on the available information on the occurrence of CHD. This probability can be computed by combining all the risk factors per age interval per pathology. According to the European guidelines, less than 4 in 1,000 people have CS (Hamm et al., 2011), whereas about 1 in 1,000 people have MI (Steg et al., 2012). We then use P(CHD) = 0.004 for CS and P(CHD) = 0.001 for MI.
The evidence P(D_{SNP}|H_{i}) can be decomposed as (10)$P\left({D}_{\text{SNP}}|{H}_{i}\right)=P\left(\right.{D}_{\text{SNP}}|\left(\text{CHD,}{H}_{i}\right)P\left(\text{CHD}\right)+P\left({D}_{\text{SNP}}|\overline{\text{CHD}},{H}_{i}\right)P\left(\overline{\text{CHD}}\right).$ In the case of H_{0}, $P\left(\right.{D}_{\text{SNP}}|\left(\text{CHD,}{H}_{0}\right)=\left(\genfrac{}{}{0ex}{}{n}{{n}_{\text{SNP}}}\right){p}_{0}^{{n}_{\text{SNP}}}{\left(1-{p}_{0}\right)}^{{n}_{\overline{\text{SNP}}}}\equiv P\left({D}_{\text{SNP}}|{H}_{0}\right)$ (11)$P\left({D}_{\text{SNP}}|\overline{\text{CHD}},{H}_{0}\right)=P\left({D}_{\text{SNP}}|{H}_{0}\right),$ whereas in the case of H_{1}, $P\left(\right.{D}_{\text{SNP}}|\left(\text{CHD,}{H}_{1}\right)=\left(\genfrac{}{}{0ex}{}{{n}_{\text{CHD}}}{{n}_{\text{SNP,CHD}}}\right){p}_{1,\text{CHD}}^{{n}_{\text{SNP,CHD}}}{\left(1-{p}_{1,CHD}\right)}^{{n}_{\overline{\text{SNP}},\text{CHD}}}$ (12)$P\left({D}_{\text{SNP}}|\overline{\text{CHD}},{H}_{1}\right)=\left(\genfrac{}{}{0ex}{}{{n}_{\overline{\text{CHD}}}}{{n}_{SNP,\overline{\text{CHD}}}}\right){p}_{1,\overline{\text{CHD}}}^{{n}_{\text{SNP},\overline{\text{CHD}}}}{\left(1-{p}_{1,\overline{\text{CHD}}}\right)}^{{n}_{\overline{\text{SNP}},\overline{\text{CHD}}}}.$ In the previous section, we computed the evidence by marginalizing the parameters of each hypothesis. Here, assuming a hypothesis H_{i} and using the Bayes theorem, we compute the posterior probability of each parameter p_{i,j} given the data (13)$P\left({p}_{i,j}|{D}_{\text{SNP}}\right)=\frac{P\left({D}_{\text{SNP}}|{p}_{i,j}\right)P\left({p}_{i,j}|{H}_{i}\right)}{P\left({D}_{\text{SNP}}|{H}_{i}\right)},$ and find for p_{i,j} the value that maximizes the likelihood P(D_{SNP}|p_{i,j}). In the case of H_{0}, we compute the posterior probability of the single parameter p_{i,j} = p_{0}, where P(D_{SNP}|p_{0}) is given by Eq. (4), P(D_{SNP}|H_{0}) is given by Eq. (5) and P(p_{0}|H_{0}) is assumed uniform. Taking the derivative with respect to p_{0} and solving for dP(p_{0}|D_{SNP})/dp_{0} = 0, we find for the maximum-likelihood value of p_{0} the value (14)${p}_{0\left(\text{maxL}\right)}=\frac{{n}_{\text{SNP}}}{\left({n}_{\text{SNP}}+{n}_{\overline{\text{SNP}}}\right)}.$ Similarly in the case of H_{1}, we compute the posterior probability of each of the two parameters ${p}_{i,j}=\left\{{p}_{1,\text{CHD}},{p}_{1,\overline{\text{CHD}}}\right\}$, where P(D_{SNP}|p_{1,CHD}) and $P\left({D}_{\text{SNP}}|{p}_{1,\overline{\text{CHD}}}\right)$ are given by Eq. (6), P(D_{SNP}|H_{1}) is given by Eq. (8) and both P(p_{1,CHD}|H_{1}) and $P\left({p}_{1,\overline{\text{CHD}}}|{H}_{1}\right)$ are assumed uniform, finding for the maximum-likelihood values of p_{1,CHD} and ${p}_{1,\overline{\text{CHD}}}$ respectively (15)${p}_{1,\text{CHD}\left(\text{maxL}\right)}=\frac{{n}_{\text{SNP,CHD}}}{\left({n}_{\text{SNP,CHD}}+{n}_{\overline{\text{SNP}},\text{CHD}}\right)},$ (16)${p}_{1,\overline{\text{CHD}}\left(\text{maxL}\right)}=\frac{{n}_{\text{SNP},\overline{\text{CHD}}}}{\left({n}_{\text{SNP},\overline{\text{CHD}}}+{n}_{\overline{\text{SNP}},\overline{\text{CHD}}}\right)}.$
Analogously we define the posterior probability (17)$P\left(\overline{\text{CHD}}|{D}_{\text{SNP}},{H}_{i}\right)=\frac{P\left({D}_{\text{SNP}}|\overline{\text{CHD}},{H}_{i}\right)P\left(\overline{\text{CHD}}\right)}{P\left({D}_{\text{SNP}}|{H}_{i}\right)}.$
Finally, using the maximum-likelihood value of p_{i,j}, we compute P(CHD|D_{SNP}, H_{i}) for the data sets combined, which we present in Table 2.
In the case of H_{0}, no information is added to the posterior probability, since by Eq. (11) the posterior probabilities equal the prior. Conversely in the case of H_{1}, information is added to the posterior probability, since by Eq. (12) there result posterior probabilities different from the prior albeit compatible with the prior.
Hypothesis | Probabilities | Phenotype (j) | ||
---|---|---|---|---|
Cauc CS | Cauc MI | Asian CS | ||
H _{0} | p _{0} | 0.299 ± 0.001 | 0.284 ± 0.001 | 0.141 ± 0.001 |
P(CHD|D_{SNP}, H_{0}) | (4.00 ± 1.31) ⋅ 10^{−3} | (1.00 ± 0.25) ⋅ 10^{−3} | (4.00 ± 0.91) ⋅ 10^{−3} | |
P(nextSNP, CHD|D_{SNP}, H_{0}) | (1.19 ± 0.39) ⋅ 10^{−3} | (0.28 ± 0.07) ⋅ 10^{−3} | (0.56 ± 0.13) ⋅ 10^{−3} | |
$P\left(\text{nextSNP},\overline{\text{CHD}}|{D}_{\text{SNP}},{H}_{0}\right)$ | 0.298 ± 1.093 | 0.284 ± 1.752 | 0.141 ± 0.360 | |
P(nextSNP|D_{SNP}, H_{0}) | 0.299 ± 1.093 | 0.284 ± 1.572 | 0.141 ± 0.360 | |
r _{nextSNP,CHD} | (4.00 ± 14.65) ⋅ 10^{−3} | (1.00 ± 5.54) ⋅ 10^{−3} | (4.00 ± 10.22) ⋅ 10^{−3} | |
H _{1} | p _{1,CHD} | 0.295 ± 0.001 | 0.283 ± 0.001 | 0.158 ± 0.001 |
${p}_{1,\overline{\text{CHD}}}$ | 0.305 ± 0.001 | 0.285 ± 0.001 | 0.132 ± 0.001 | |
P(CHD|D_{SNP}, H_{1}) | (3.42 ± 7.94) ⋅ 10^{−3} | (0.98 ± 3.26) ⋅ 10^{−3} | (5.00 ± 7.02) ⋅ 10^{−3} | |
P(nextSNP, CHD|D_{SNP}, H_{1}) | (1.00 ± 2.34) ⋅ 10^{−3} | (0.28 ± 0.92) ⋅ 10^{−3} | (0.79 ± 1.11) ⋅ 10^{−3} | |
$P\left(\text{nextSNP},\overline{\text{CHD}}|{D}_{\text{SNP}},{H}_{1}\right)$ | 0.304 ± 0.598 | 0.285 ± 0.926 | 0.131 ± 0.244 | |
P(nextSNP|D_{SNP}, H_{1}) | 0.305 ± 0.598 | 0.285 ± 0.926 | 0.132 ± 0.244 | |
r _{nextSNP,CHD} | (3.30 ± 10.02) ⋅ 10^{−3} | (0.98 ± 4.54) ⋅ 10^{−3} | (6.00 ± 13.84) ⋅ 10^{−3} |
Prediction of the presence of the SNP
We now proceed to compute the probability for the presence of the SNP, i.e., given the data, we determine the probability that a randomly selected patient (with or without CHD) has the SNP. This probability is defined as (18)$P\left(\text{nextSNP}|{D}_{\text{SNP}},{H}_{i}\right)=P\left(\text{nextSNP}|{D}_{\text{SNP}},\text{CHD}\right)P\left(\text{CHD}|{D}_{\text{SNP}},{H}_{i}\right)+P\left(\text{nextSNP}|{D}_{\text{SNP}},\overline{\text{CHD}}\right)P\left(\overline{\text{CHD}}|{D}_{\text{SNP}},{H}_{i}\right)\equiv P\left(\text{nextSNP},\text{CHD}|{D}_{\text{SNP}},{H}_{i}\right)+P\left(\text{nextSNP},\overline{\text{CHD}}|{D}_{\text{SNP}},{H}_{i}\right).$ In the case of H_{0}, (19)$P\left(\text{nextSNP}|{D}_{\text{SNP}},\text{CHD}\right)=P\left(\text{nextSNP}|{D}_{\text{SNP}},\overline{\text{CHD}}\right)={p}_{0},$ whereas in the case of H_{1}, $P\left(\text{nextSNP}|{D}_{\text{SNP}},\text{CHD}\right)={p}_{1,\text{CHD}},$ (20)$P\left(\text{nextSNP}|{D}_{\text{SNP}},\overline{\text{CHD}}\right)={p}_{1,\overline{\text{CHD}}}.$ Using the maximum-likelihood values of p_{i,j} and the posterior probability P(CHD|D_{SNP}, H_{i}) computed above, we compute P(nextSNP|D_{SNP}, H_{i}), which we present in Table 2.
For completion, using the Bayes theorem, we invert P(nextSNP|D_{SNP}, CHD) to find the probability that CHD will occur given that the SNP is present in a randomly selected patient (21)$P\left(\text{CHD}|\text{nextSNP},{H}_{i}\right)=\frac{P\left(\text{nextSNP}|{D}_{\text{SNP}},\text{CHD}\right)P\left(\text{CHD}|{D}_{\text{SNP}},{H}_{i}\right)}{P\left(\text{nextSNP}|{D}_{\text{SNP}},{H}_{i}\right)}.$ Similarly, inverting $P\left(\text{nextSNP}|{D}_{\text{SNP}},\overline{\text{CHD}}\right)$, we find the probability that CHD will not occur given that the SNP is present in a randomly selected patient, $P\left(\overline{\text{CHD}}|\text{nextSNP},{H}_{i}\right)$, which can be found simply by replacing CHD by $\overline{\text{CHD}}$ in Eq. (21).
In order to quantify the influence of CHD in the presence of the SNP, we compute the ratio of P(nextSNP, CHD|D_{SNP}) to P(nextSNP|D_{SNP}, H_{i}), which gives an estimate of how much the occurrence of CHD indicates the presence of the SNP. This is also the probability in Eq. (21). In the case of H_{0}, this ratio equals the posterior probability of occurrence of CHD. Conversely in the case of H_{1}, this ratio is different from the posterior probability of occurrence of CHD albeit compatible with it. The occurrence of CHD indicates the presence of the SNP in of order 0.1% of patients (0.1–0.4% in the case of H_{0}, 0.1–0.6% in the case of H_{1}), which suggests that the occurrence of CHD is not a good marker for the presence of the SNP.
In order to quantify the influence of the SNP in the occurrence of (CHD, we compute the ratio of P(CHD|nextSNP, H_{i}) to P(CHD|D_{SNP}, H_{i}), which gives an estimate of how much the presence of the SNP indicates the occurrence of CHD. This is also the probability in Eqs. (19) and (20). The presence of SNP indicates the occurrence of CHD in of order 0.1% of patients (0.141–0.299% in the case of H_{0}, 0.158–0.295% in the case of H_{1}), which suggests that the presence of the SNP is not a risk factor for the emergence of CHD.
Hypothesis | Probabilities | Phenotype (j) | ||
---|---|---|---|---|
Cauc CS | Cauc MI | Asian CS | ||
H _{0} | p _{0} | 0.288 ± 0.001 | 0.296 ± 0.001 | 0.136 ± 0.001 |
P(CHD|D_{SNP}, H_{0}) | (4.00 ± 1.26) ⋅ 10^{−3} | (1.00 ± 0.24) ⋅ 10^{−3} | (4.00 ± 0.89) ⋅ 10^{−3} | |
P(nextSNP, CHD|D_{SNP}, H_{0}) | (1.15 ± 0.36) ⋅ 10^{−3} | (0.30 ± 0.07) ⋅ 10^{−3} | (0.55 ± 0.12) ⋅ 10^{−3} | |
$P\left(\text{nextSNP},\overline{\text{CHD}}|{D}_{\text{SNP}},{H}_{0}\right)$ | 0.287 ± 1.018 | 0.296 ± 1.605 | 0.136 ± 0.340 | |
P(nextSNP|D_{SNP}, H_{0}) | 0.289 ± 1.018 | 0.296 ± 1.605 | 0.136 ± 0.340 | |
r _{nextSNP,CHD} | (4.00 ± 14.16) ⋅ 10^{−3} | (1.00 ± 5.54) ⋅ 10^{−3} | (4.00 ± 10.00) ⋅ 10^{−3} | |
H _{1} | p _{1,CHD} | 0.290 ± 0.001 | 0.292 ± 0.001 | 0.151 ± 0.001 |
${p}_{1,\overline{\text{CHD}}}$ | 0.287 ± 0.001 | 0.300 ± 0.001 | 0.128 ± 0.001 | |
P(CHD|D_{SNP}, H_{1}) | (3.34 ± 7.57) ⋅ 10^{−3} | (0.99 ± 3.18) ⋅ 10^{−3} | (5.11 ± 6.96) ⋅ 10^{−3} | |
P(nextSNP, CHD|D_{SNP}, H_{1}) | (0.97 ± 2.19) ⋅ 10^{−3} | (0.29 ± 0.93) ⋅ 10^{−3} | (0.77 ± 1.05) ⋅ 10^{−3} | |
$P\left(\text{nextSNP},\overline{\text{CHD}}|{D}_{\text{SNP}},{H}_{1}\right)$ | 0.286 ± 0.542 | 0.300 ± 0.947 | 0.128 ± 0.234 | |
P(nextSNP|D_{SNP}, H_{1}) | 0.287 ± 0.543 | 0.300 ± 0.947 | 0.129 ± 0.234 | |
r _{nextSNP,CHD} | (3.38 ± 9.96) ⋅ 10^{−3} | (0.96 ± 4.34) ⋅ 10^{−3} | (6.02 ± 13.67) ⋅ 10^{−3} |
Hypothesis | Probabilities | Phenotype (j) | ||
---|---|---|---|---|
Cauc CS | Cauc MI | Asian CS | ||
H _{0} | p _{0} | 0.308 ± 0.001 | 0.271 ± 0.001 | 0.177 ± 0.001 |
P(CHD|D_{SNP}, H_{0}) | (4.00 ± 1.08) ⋅ 10^{−3} | (1.00 ± 0.20) ⋅ 10^{−3} | (4.00 ± 0.63) ⋅ 10^{−3} | |
P(nextSNP, CHD|D_{SNP}, H_{0}) | (1.12 ± 0.33) ⋅ 10^{−3} | (0.27 ± 0.05) ⋅ 10^{−3} | (0.71 ± 0.11) ⋅ 10^{−3} | |
$P\left(\text{nextSNP},\overline{\text{CHD}}|{D}_{\text{SNP}},{H}_{0}\right)$ | 0.306 ± 0.923 | 0.270 ± 1.220 | 0.177 ± 0.314 | |
P(nextSNP|D_{SNP}, H_{0}) | 0.308 ± 0.923 | 0.271 ± 1.220 | 0.177 ± 0.314 | |
r _{nextSNP,CHD} | (4.00 ± 12.05) ⋅ 10^{−3} | (1.00 ± 4.51) ⋅ 10^{−3} | (4.00 ± 7.11) ⋅ 10^{−3} | |
H _{1} | p _{1,CHD} | 0.306 ± 0.001 | 0.270 ± 0.001 | 0.190 ± 0.001 |
${p}_{1,\overline{\text{CHD}}}$ | 0.309 ± 0.001 | 0.271 ± 0.001 | 0.165 ± 0.001 | |
P(CHD|D_{SNP}, H_{1}) | (3.93 ± 6.97) ⋅ 10^{−3} | (0.92 ± 2.58) ⋅ 10^{−3} | (3.90 ± 4.35) ⋅ 10^{−3} | |
P(nextSNP, CHD|D_{SNP}, H_{1}) | (1.20 ± 2.14) ⋅ 10^{−3} | (0.25 ± 0.70) ⋅ 10^{−3} | (0.74 ± 0.828) ⋅ 10^{−3} | |
$P\left(\text{nextSNP},\overline{\text{CHD}}|{D}_{\text{SNP}},{H}_{1}\right)$ | 0.308 ± 0.535 | 0.271 ± 0.698 | 0.165 ± 0.187 | |
P(nextSNP|D_{SNP}, H_{1}) | 0.309 ± 0.535 | 0.271 ± 0.698 | 0.165 ± 0.187 | |
r _{nextSNP,CHD} | (3.90 ± 9.68) ⋅ 10^{−3} | (0.91 ± 3.48) ⋅ 10^{−3} | (4.48 ± 7.13) ⋅ 10^{−3} |
Sensitivity of the results
To test the robustness of this meta-analysis, we conceive two tests of the sensitivity of the results, namely to low-significance data sets, to data sets with extreme results and to extreme data sets.
To test the sensitivity of the results to low-significance data sets, we exclude the data sets with comparatively small sample sizes for the same CHD phenotype, namely the study by Elahi et al. (2008) and the study by Chen et al. (2001), from the combination. We also exclude the studies with extreme results (i.e., the studies with the largest Bayes factor), namely the study in Dedoussis et al. (2005). We recompute both the Bayes factors (Table 1) and the probabilities of CHD (Table 3). We observe that the Bayes factor in the new combination changes by 18%, −38% and 24%, respectively for the CS Caucasian, the MI Caucasian and the CS Asian population. The inferred parameters and probabilities vary by −6 to 6%, −5 to 2%, and −1 to 4%, respectively for the CS Caucasian, the MI Caucasian and the CS Asian population. The largest difference is observed for the CS Caucasian population due to the exclusion of the study by Elahi et al. (2008). The exclusion of the study by Dedoussis et al. (2005) from the MI Caucasian population causes predominantly negative differences.
To test the sensitivity of the results to extreme data sets, we exclude the data sets with comparatively large samples sizes for the same CHD phenotype, namely the study by Georges et al. (2003), the study by Bennet et al. (2006) and the study by Hou et al. (2009), from the combination. These are also the studies with the smallest Bayes factor for each CHD phenotype. We recompute both the Bayes factors (Table 1) and the probabilities of CHD (Table 4). We observe that the Bayes factor in the new combination changes by 3%, −19% and 32%, respectively for the CS Caucasian, the MI Caucasian and the CS Asian population. The inferred parameters and probabilities vary by −20 to −1%, 5 to 11%, and −26 to 25%, respectively for the CS Caucasian, the MI Caucasian and the CS Asian population. The largest difference is observed for the CS Asian population due to the exclusion of the study by Hou et al. (2009). The exclusion of the study by Georges et al. (2003) from the CS Caucasian population causes predominantly negative differences.
In both tests, the differences in the Bayes factor leave the result of the hypothesis testings unchanged, while the differences in the inferred parameters and probabilities also leave the conclusions unchanged. We thus infer that this formalism is largely insensitive to (a) low-significante data sets combined with data with extreme results, and to (b) extreme data sets, which renders this formalism significantly robust.
Conclusions
In this manuscript we investigated the correlation between the occurrence of CHD with the presence of the −308 TNF-α SNP from fifteen independent data sets on Caucasians for two CHD phenotypes and from five independent data sets on Asian for one CHD phenotype. We showed how to combine independent data sets and to infer correlations using Bayesian analysis.
Hypothesis testing on the combined data sets indicated that there is no evidence for a correlation between the occurrence of CHD and the presence of the SNP, either on Caucasians or on Asians. This result agrees with previous meta-analyses (Zhang et al., 2011; Chu et al., 2013). As a measure of an eventual correlation, we computed the conditional probability of CHD given the SNP, normalized to the probability that CHD occurs, finding that the presence of the SNP indicates the occurrence of CHD in of order 0.1% of patients, i.e., in of order 0.1% of the occurrence of CHD is concomitant with the presence of SNP. We also tested the sensitivity of the results by excluding selected data sets from the meta-analysis. We found changes of order 10%, leaving the results unchanged and thus establishing this formalism as significantly robust.
An interesting extension of this work for the sake of completion is the inclusion of studies referring to Africans and Indians which are currently too few to extract convincing results.