Predicting gene expression using DNA methylation in two human populations

Department of Biology, Hong Kong Baptist University, Hong Kong, China
School of Biomendical Informatics, University of Texas Health Center at Houston, Houston, Texas, United States
Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States
DOI
10.7287/peerj.preprints.27055v1
Subject Areas
Bioinformatics, Computational Biology, Genomics, Epidemiology, Statistics
Keywords
DNA methylation, LASSO, Methylation Microarray, transcriptome
Copyright
© 2018 Zhong et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Zhong H, Kim S, Zhi D, Cui X. 2018. Predicting gene expression using DNA methylation in two human populations. PeerJ Preprints 6:e27055v1

Abstract

Background. DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative regulation in the promoter region. However, its correlation with gene expression at population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples but not RNA samples.

Results. We studied two human population datasets, Multiple Tissue Human Expression Resource Projects (MuTHER)’s Adipose tissue as well as asthma and normal peoples’ peripheral blood mononuclear cell (PBMC), for predicting gene expression using methylation of all CpG sites from the gene region. Three prediction models were investigated; single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, even with LASSO regression, very small prediction R2 was obtained for the majority of genes and only about one thousand genes had prediction R2 greater than 0.1. GO term and pathway analyses of these more predictable genes showed that they are enriched for immune and defense genes.

Conclusion. In human populations, DNA methylation of CpG sites at gene region have weak prediction power for gene expression. The relatively more predictable genes tend to be defense and immune genes.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

supplementary figures and tables

DOI: 10.7287/peerj.preprints.27055v1/supp-1