Predicting gene expression using DNA methylation in two human populations
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Genomics, Epidemiology, Statistics
- Keywords
- DNA methylation, LASSO, Methylation Microarray, transcriptome
- Copyright
- © 2018 Zhong et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. Predicting gene expression using DNA methylation in two human populations. PeerJ Preprints 6:e27055v1 https://doi.org/10.7287/peerj.preprints.27055v1
Abstract
Background. DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative regulation in the promoter region. However, its correlation with gene expression at population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples but not RNA samples.
Results. We studied two human population datasets, Multiple Tissue Human Expression Resource Projects (MuTHER)’s Adipose tissue as well as asthma and normal peoples’ peripheral blood mononuclear cell (PBMC), for predicting gene expression using methylation of all CpG sites from the gene region. Three prediction models were investigated; single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, even with LASSO regression, very small prediction R2 was obtained for the majority of genes and only about one thousand genes had prediction R2 greater than 0.1. GO term and pathway analyses of these more predictable genes showed that they are enriched for immune and defense genes.
Conclusion. In human populations, DNA methylation of CpG sites at gene region have weak prediction power for gene expression. The relatively more predictable genes tend to be defense and immune genes.
Author Comment
This is a submission to PeerJ for review.