GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

Li Chen; James Reeve; Lujun Zhang; Shengbin Huang; Jun Chen

doi:10.7287/peerj.preprints.3417v1

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

Li Chen¹, James Reeve², Lujun Zhang³, Shengbin Huang², Jun Chen ^2,4

1 Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University, Auburn, Alabama, United States

2 Bioinformatics and Computational Biology Program, University of Minnesota - Rochester, Rochester, Minnesota, United States

3 College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang, China

4 Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, United States

DOI: 10.7287/peerj.preprints.3417v1

Published: 2017-11-17
Accepted: 2017-11-17

Subject Areas: Bioinformatics, Genomics, Statistics
Keywords: Normalization, Metagenomics, Microbiome, Statistics, Zeroinflation, RNA-Seq

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Chen L, Reeve J, Zhang L, Huang S, Chen J. 2017. GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data. PeerJ Preprints 5:e3417v1 https://doi.org/10.7287/peerj.preprints.3417v1

Abstract

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero inflation remain largely undeveloped. Here we propose GMPR - a simple but effective normalization method - for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Author Comment

This is a submission to PeerJ for review.