pcr: an R package for quality assessment, analysis and testing of qPCR data

View article
Bioinformatics and Genomics
Note that a Preprint of this article also exists, first published December 20, 2017.

Introduction

Real-time quantitative PCR (qPCR) is a commonly used technique to analyze the relative gene expression level in the biomedical research. In most cases, small scale experiments are designed to quantify the level of mRNA among experimental conditions. Some advanced machines and optimized protocols have simplified the experiments to a highly efficient one-step process, allowing the effective analysis of a large scale of qPCR data. However, all processes for assessing the quality of the data, performing the analysis and reporting the results are not done in the most uniform way across the literature (Bustin & Nolan, 2004). Different analysis models have been proposed and implemented in different software environments (Pabinger et al., 2014). Furthermore, requirements and guidelines for reporting qPCR data were independently introduced (Bustin et al., 2009).

In this report, we introduce an open source R package for performing quality assessment, modeling and testing for statistical significance of qPCR data in a uniform way. In its current version, the pcr package implement two methods for relative quantification of mRNA expression proposed originally by Livak & Schmittgen (2001), in addition to the necessary steps to check the assumption of these methods. Also, we implement a number of methods to check for statistical significance in qPCR data which were introduced in SAS by Yuan et al. (2006). Finally, the package provides unified interface to make the analysis accessible and the ability to make graphs of the different analysis steps for visual inspection and preparation of publication-level figures. We start by describing the process for generating the data in the original papers, briefly introduce the methods and apply them to the original data using the pcr.

Materials & Methods

Data sources

To illustrate the usage of the pcr package and to apply it to qPCR data, we used real qPCR datasets from two published papers. In addition, we compared the results obtained by the pcr package to that of the original paper to ensure the reliability. In their original article, Livak & Schmittgen (2001) collected total RNA from human tissues; brain and kidney. c-myc and GAPDH primers were then used for cDNA synthesis and used as input in the PCR reaction. Seven different dilutions where used as input to the PCR reaction (three replicates each), this dataset was referred to as ct3 and shown in Table 1. Six replicates for each tissue were run in separate tubes. This dataset was referred to as ct1 through this document and shown along with the difference calculations in Tables 2 and 3. At the second work, Yuan et al. (2006) extracted total RNA from Arabidopsis thaliana plant treated and control samples (24 samples each), and performed qPCR analyses using MT7 and ubiquitin primers. This dataset was referred to as ct4 and shown the results of the different testing methods that applied in the original paper in Table 4. In all datasets, the raw CT (Cycle Threshold), also known as Cq, was recorded (Bustin et al., 2009) and will be used as input to the pcr analysis pipeline.

Table 1:
Average CT value for c-myc and GAPDH at different input amounts.
Input RNA (ng) c-myc average CT GAPDH average CT ΔCT c-myc-GAPDH
1.0 25.59 ± 0.04 22.64 ± 0.03 2.95 ± 0.05
0.5 26.77 ± 0.09 23.73 ± 0.05 3.04 ± 0.10
0.2 28.14 ± 0.05 25.12 ± 0.10 3.02 ± 0.11
0.1 29.18 ± 0.13 26.16 ± 0.02 3.01 ± 0.13
0.05 30.14 ± 0.03 27.17 ± 0.06 2.97 ± 0.07
0.02 31.44 ± 0.16 28.62 ± 0.10 2.82 ± 0.19
0.02 32.42 ± 0.12 29.45 ± 0.08 2.97 ± 0.14
DOI: 10.7717/peerj.4473/table-1
Table 2:
Relative quantification using comparative (ΔΔCT) method (separate tubes).
Tissue c-myc CT GAPDH CT ΔCT c-myc-GAPDH ΔΔCTΔCT − ΔCT,Brain c-mycN Rel. to Brain
Brain 30.72 23.7
30.34 23.56
30.58 23.47
30.34 23.65
30.5 23.69
30.43 23.68
Average 30.49 ± 0.15 23.63 ± 0.09 6.86 ± 0.17 0.00 ± 0.17 1.0 (0.9–1.1)
Kidney 27.06 22.76
27.03 22.61
27.03 22.62
27.1 22.6
26.99 22.61
26.94 24.18
Average 27.03 ± 0.06 22.66 ± 0.08 4.37 ± 0.10 −2.50 ± 0.10 5.6 (5.3–6.0)
DOI: 10.7717/peerj.4473/table-2
Table 3:
Relative quantification using the standard curve method (separate tube).
Tissue c-myc (ng) GAPDH (ng) c-mycN norm. to GAPDH c-mycN Rel. to Brain
Brain 0.033 0.51
0.043 0.56
0.036 0.59
0.043 0.53
0.039 0.51
0.040 0.52
Average 0.039 ± 0.004 0.54 ± 0.034 0.07 ± 0.008 1.0 ± 0.12
Kidney 0.40 0.96
0.41 1.06
0.41 1.05
0.39 1.07
0.42 1.06
0.43 0.96
Average 0.41 ± 0.016 1.02 ± 0.052 0.40 ± 0.025 5.5 ± 0.35
DOI: 10.7717/peerj.4473/table-3
Table 4:
Statistical significance using different testing methods.
Test ΔΔCT (estimate) p-value Confidence interval
Multiple regression −0.6848 <0.0001 (−0.4435, −0.9262)
ANOVA −0.6848 <0.0001 (−0.4435, −0.9262)
t-test −0.6848 <0.0001 (−0.4147, −0.955)
Wilcoxon test −0.6354 <0.0001 (−0.4227, −0.8805)
DOI: 10.7717/peerj.4473/table-4

Statistical methods

In contrast with the absolute quantification of the amount of mRNA in a sample, the relative quantification uses a internal control (reference gene) and/or a control group (reference group) to quantify the mRNA of interest relative to these references. This relative quantification was sufficient to draw conclusions in most of the biomedical applications involving qPCR. A few methods were developed to perform these relative quantification. These methods generally require different assumptions and models for the analysis. The most common two of these methods were described here in the following sections.

The comparative CT methods

The comparative CT methods assume that the cDNA templates of the gene/s of interest as well as the control/reference gene have similar amplification efficiency, and also that this amplification efficiency is near perfect. This means that, at a certain threshold during the linear portion of the PCR reaction, the amount of the gene of the interest and the control double each cycle. Another assumption is that the expression difference between two genes or two samples can be captured by subtracting one (gene or sample of interest) from another (reference). The final assumption is that the reference doesn’t change with the treatment or the course in question. The formal derivation of the double delta CT model is described here Livak & Schmittgen (2001). Briefly, the ΔΔCT is given by: Δ Δ C T = Δ C T , q Δ C T , c b

And the relative expression by: 2 Δ Δ C T where ΔCT,q is the difference in the CT (or their average) of a gene of interest and a reference gene in a group of interest. ΔCT,cb is the the difference in the CT (or their average) of a gene of interest and a reference gene in a reference group. The error term is given by: s = s 1 2 + s 2 2 where s1 is the standard deviation of a gene of interest and s2 is the standard deviation of a reference gene.

Standard curve

In comparison, this model doesn’t assume perfect amplification but rather actively use the amplification in calculating the relative expression. Therefore, when the amplification efficiency of all genes are 100% both methods should give similar results. The standard curve method is applied using two steps. First, serial dilutions of the mRNAs from the samples of interest are used as input to the PCR reaction. The linear trend of the log input amount and the resulting CT values for each gene are used to calculate an intercept and a slope. Secondly, these intercepts and slopes are used to calculate the amounts of mRNA of the genes of interest and the control/reference in the samples of interest and the control sample/reference. These amounts are finally used to calculate the relative expression in a manner similar to the later method, just using division instead of subtraction. The formal derivation of the model is described here (Yuan et al., 2006). Briefly, the amount of RNA in a sample is given by: log amount = C T b m .

And the relative expression is given by: 1 0 log amount where CT is the cycle threshold of a gene. b is the intercept of CT log10 input amount. m is the slope of CT. And the error term is given by: s = c v X ̄ where: c v = c v 1 2 + c v 2 2 where s is the standard deviation. X ̄ is the average. cv is the coefficient of variation or relative standard deviation.

Statistical significance tests

Assuming that the assumptions of the first methods are holding true, the simple t-test can be used to test the significance of the difference between two conditions (ΔCT). t-test assumes, in addition, that the input CT values are normally distributed and the variance between conditions are comparable. Wilcoxon test can be used if sample size is small, and those two last assumptions are hard to achieve.

Two use the linear regression here. A null hypothesis is formulated as following, C T , target,treatment C T , control,treatment = C T , target,control C T , control,control .

This is exactly the ΔΔCT value as explained earlier. So the ΔΔCT is estimated and the null is rejected when ΔΔCT ≠ 0.

Quality assessment

Fortunately, regardless of the method used in the analysis of qPCR data, The quality assessment can be done in a similar way. It requires an experiment similar to that of calculating the standard curve. Serial dilutions of the genes of interest and controls are used as input to the reaction and different calculations are made. The amplification efficiency is approximated be the linear trend between the difference between the CT value of a gene of interest and a control/reference (ΔCT) and the log input amount. This piece of information is required when using the ΔΔCT model. Typically, the slope of the curve should be very small and the R2 value should be very close to one. A value of the amplification efficiency itself is given by 10−1∕slope and should be close to 2. Other analysis methods are recommended when this is not the case. Similar curves are required for each gene using the CT value instead of the difference for applying the standard curve method. In this case, a separate slope and intercept for each gene are required for the calculation of the relative expression.

Results & Discussion

Availability & installation

The pcr packages is available on CRAN, the main repository for R packages and can be installed by invoking install.packages in an R (≥3.4.2) session. The package’s source code is also available on github, https://github.com/MahShaaban/pcr along with the development version.

 
 
# install  the  pcr  package  from  CRAN 
install.packages('pcr')    

The examples shown in this article are explained in greater details in the package vignette that can be accessed through browseVignette(’pcr’). Moreover, the package documentation provides detailed instruction on the input and the output of each function (e.g., ?pcr_analyze).

Functionality & user interface

The pcr package provides different methods for performing quality assessment, modeling and testing real-time qualitative PCR data through the unified interface of three functions: pcr_assess, pcr_analyze and pcr_test, respectively.

Quality assessment

pcr_assess provides two methods for assessing the quality of qPCR data. These are ‘efficiency’ and ‘standard_curve’ to calculate the amplification efficiency and gene standard curves as described in the methods section. The following code block applies both methods to the dataset ct3, shown in Table 1. Using the argument plot as TRUE in the pcr_assess function provides the a graphic presentation of the amplification and the standard curves as shown in Fig. 1.

Amplification efficiency and standard curves of c-myc and GAPDH.
Figure 1: Amplification efficiency and standard curves of c-myc and GAPDH.
Seven different dilutions of RNA were used as an input to synthesize cDNA, then to a real-time quantitative PCR reaction with c-myc and GAPDH primers. (A) ΔCT values were calculated by subtracting the CT values from three independent samples of the control gene(GAPDH) from the target c-myc. Averages and standard deviations are shown as points and error bars. The blue line represents the linear trend between the ΔCT and log10 of the input amount. (B) CT values from three independent samples of c-myc and GAPDH are shown with the corresponding log10 input amounts.

 
 
# load  required  libraries 
library(pcr) 
library(ggplot2) 
library(cowplot) 
library(dplyr) 
library(xtable) 
library(readr)    

 
 
#  pcr_assess 
##  locate  and  read  data 
fl <− system.file('extdata', 'ct3.csv', package = 'pcr') 
ct3 <− read_csv(fl) 
 
## make a vector  of RNA  amounts 
amount <− rep(c(1, .5, .2, .1, .05, .02, .01),  each = 3) 
 
##  calculate  amplification  efficiency 
res1 <− pcr_assess(ct3, 
                      amount = amount, 
                      reference_gene  = 'GAPDH', 
                      method = 'efficiency') 
 
##  calculate  standard  curves 
res2 <− pcr_assess(ct3, 
                      amount = amount, 
                      method = 'standard_curve ') 
 
##  retain  curve  information 
intercept <− res2$intercept 
slope <− res2$slope    

Analysis models

Similarly, pcr_analyze provides two methods to model the CT values and calculates the relative expression of target genes. ’delta_delta_ct’ performs the ΔΔCT method described previously. The average relative expression of the target gene in the condition of interest is given by the Eqs. (1) and (2) and the standard deviation by Eq. (3). The calculations are applied to the dataset ‘ct1’, shown in Table 2 and Fig. 2A. ‘relative_curve’ performs the relative standard curve quantification, average relative expression/amount of the target gene in the condition of interest is given by Eqs. (4) and (5) and the standard deviation by Eqs. (6) and (7). The calculation is applied to the same dataset ‘ct1’ and is shown in Table 3 and Fig. 2B.

Relative expression of c-myc in human brain and kidney tissues.
Figure 2: Relative expression of c-myc in human brain and kidney tissues.
Total RNA from human brain and kidney tissues were used to synthesize cDNA and real-time quantitative PCR reaction with c-myc and GAPDH primers. CT values from six independent replicates were used to calculate the expression of c-myc in the kidney normalized by GAPDH and relative to the brain using The ΔΔCT (A) and the standard curve methods (B). Averages and standard deviations are shown as bars and error bars.

  
 
#  pcr_analyze 
##  locate  and  read  raw ct data 
fl <− system.file('extdata', 'ct1.csv', package = 'pcr') 
ct1 <− read_csv(fl) 
 
## add  grouping  variable 
group_var <− rep(c('brain', 'kidney'), each = 6) 
 
# calculate  all  values  and  errors  in one  step 
## mode == 'separate_tube' default 
res1 <− pcr_analyze(ct1, 
                        group_var  = group_var, 
                        reference_gene  = 'GAPDH', 
                        reference_group  = 'brain') 
 
##  calculate  standard  amounts  and  error 
res2 <− pcr_analyze(ct1, 
                        group_var  = group_var, 
                        reference_gene  = 'GAPDH', 
                        reference_group  = 'brain', 
                        intercept = intercept, 
                        slope = slope, 
                        method = 'relative_curve ')    

Significance testing

Finally, pcr_test can be used to calculate useful statistics, p-values and confidence intervals on the previous models. Two tests are available of the two-group comparisons; ‘t.test’ and ‘wilcox.test’ to test for the difference of the normalized target gene expression (ΔCT) in one condition to another. Linear regression, ‘lm’, can be applied to estimate these differences between multiple conditions and a reference (Eq. (8)). The following code applies different testing methods to the dataset ‘ct4’. The dataset was published original in (Yuan et al., 2006), along with results of different testing method (Table 4). Table 5 shows the results of the three different tests as implemented in pcr_test.

Table 5:
Different testing methods applied to the same dataset.
Gene Estimate p_value Lower Upper Term
t.test target −0.68 0.00 −0.96 −0.41
wilcox.test target −0.64 0.00 −0.88 −0.42
lm target −0.68 0.00 −0.95 −0.41 group_vartreatment
DOI: 10.7717/peerj.4473/table-5

 
 
#  pcr_test 
# locate  and  read  data 
fl <− system.file('extdata', 'ct4.csv', package = 'pcr') 
ct4 <− read_csv(fl) 
 
# make  group  variable 
group <− rep(c('control', 'treatment'), each = 12) 
 
# test  using t −test 
tst1 <− pcr_test(ct4, 
                     group_var  = group, 
                     reference_gene  = 'ref', 
                     reference_group  = 'control', 
                     test = 't.test') 
 
# test  using  wilcox.test 
tst2 <− pcr_test(ct4, 
                     group_var  = group, 
                     reference_gene  = 'ref', 
                     reference_group  = 'control', 
                     test = 'wilcox.test') 
# testing  using  lm 
tst3 <− pcr_test(ct4, 
                     group_var  = group, 
                     reference_gene  = 'ref', 
                     reference_group  = 'control', 
                     test = 'lm')    

The pcr package workflow

In Fig. 3, we suggest a workflow based on three steps; Quality assessment, modeling and testing for statistical significance, these can be applied by using the pcr package functions; pcr_assess, pcr_analyze and pcr_test respectively. The workflow is based on the CT from two kinds of experiments. First, the CT values for the genes of interest and a reference gene in the different experimental groups. Second, the CT values of the same genes from a serial dilution experiment. The amplification efficiency can be calculated by calculating the ΔCT from the first and the slope and the intercept from the second experiment. The ΔΔCT model and the relative expression are then calculated and tested. In the case of the two groups comparison, any of the testing methods can be applied otherwise the linear regression can be used for multiple group comparisons. Alternatively, the standard curve methods can be applied by calculating the amount of cDNA in each sample from the CT values and the individual curves for each gene. These amounts are finally used to obtain the relative expression of the genes of interest.

A conceptual workflow of the analysis of RT-qPCR data.

Figure 3: A conceptual workflow of the analysis of RT-qPCR data.

A graphic description of the input, intermediary values and final output of different steps of the analysis pipeline and their relations. Colors represent distinct steps and the numbers are the order in which they are applied in a typical analysis.

Comparison with other packages

Pabinger et al. (2014) surveyed the tools used to analyze qPCR data across different platforms. They included nine R packages which provide very useful analysis and visualization methods. Table 6 shows a comparison of the features of these nine packages from the original publication, in addition to the pcr package. Some of these packages focuses one certain model and some are designed to handle high-throughput qPCR data (e.g., chipPCR). Most of these packages are hosted in CRAN and a few on the Bioconductor so they adhere to Bioconductor methods and data containers (e.g., qpcR).

Table 6:
Comparison with available R packages for qPCR quantification.
Package Rawa Efficiency Abs quant Rel quant Error prop. Norm. NA handling Graphs Stats. MIQEb
chipPCR + + + + +
ddCT + + + + +
dpcR + + + + +
EasyqpcR + + + + +
HTqPCR + + + + +
FPK-PCR + +
NormqPCR + + + +
qpcR + + + + + + + + +
qpcrNorm + + +
pcr (proposed) + + + + + + +
DOI: 10.7717/peerj.4473/table-6

Notes:

Calculates the CT values from raw florescence data.
Complies with MIQE (Minimum Information for Publication of Quantitative Real-Time PCR Experiments) recommendations for reporting RT-qPCR results.

In comparison, pcr provides a unified interface for different quality assessment, analysis and testing models. The different functions take similar inputs to perform different analysis steps which requires minimal formating at the end of the user. The main input and the output are tidy data.frame, and the package source code follows the tidyverse practices. This package targets the small scale qPCR experimental data and the R users. The interface and documentation choices were made with such users in mind and require no deep knowledge in specific data structures or complex statistical models. Users can go from the raw CT values through different analysis steps to publication graphs in a few simple lines of code.

Limitations & future directions

The current version of the pcr package (1.1.0) provides only methods to estimate the expression of genes in a certain condition relative to another. Other methods were proposed for absolute quantification of the copy number of genes in samples (Whelan, Russell & Whelan, 2003). Also, the comparative CT methods assume that the PCR reaction has a close to perfect amplification rates. Other methods were proposed to model the data when this assumption is not true (Liu & Saint, 2002; Tichopad et al., 2003). We are planning to implement methods for absolute quantification and dealing with less than perfect amplification efficiency cases in a future version of the package.

Conclusion

To sum, the pcr package is an open source R package for quality assessing, modeling and testing real-time quantitative PCR data. The package provide an intuitive and unified interface for its main functions to allow biologist to perform all necessary steps of qPCR analysis and produce graphs in a uniform way.

82 Citations   Views   Downloads