NOT PEER-REVIEWED
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

Supplemental Information

Simulated clustering accuracy if rarefying is not penalized for removing the lowest 15th percentile samples

The right axis represents the median library size (NL), while the x-axis ‘effect size’ is the multinomial mixing proportions of the two classes of samples, ‘Ocean’ and ‘Feces’. See caption for Fig. 2 for further details.

DOI: 10.7287/peerj.preprints.1157v1/supp-1

Low library size samples can diminish result quality, regardless of normalization technique

We show the inflammatory bowel disease (IBD) dataset of Gevers et al. ( Gevers et al. 2014 ) , which has an average library size 375 sequences per sample. (a) Extremely low depth samples cluster in lower right hand corner of PCoA plots with no normalization, or rarefying alternatives, unweighted UniFrac. (b) The original library size of samples is a dominant effect, even influencing weighted UniFrac, with low library sizes and subtle biological clustering for rarefying alternatives. This diminishes if low library size samples are removed from analysis.

DOI: 10.7287/peerj.preprints.1157v1/supp-2

All normalization techniques on key microbiome datasets, Bray Curtis distance

Rows of panels show (from top to bottom) data from 88soils ( Lauber et al. 2009 ) , Body Sites ( Costello et al. 2009 ) , Moving Pictures ( Caporaso et al. 2011a ) . 88 soils is colored according to a color gradient from low to high pH. The Costello et al. body sites dataset is colored according to body site: feces (blue), oral cavity (purple), the rest of the colors are external auditory canal, hair, nostril, skin, and urine. Moving Pictures dataset: Left and Right palm (red/blue), tongue (green), feces (orange). It is important to note that all the samples in these datasets are approximately the same depth, and there are very strong driving gradients.

DOI: 10.7287/peerj.preprints.1157v1/supp-3

All normalization techniques on key microbiome datasets, unweighed UniFrac distance

See Figure S3 caption for details.

DOI: 10.7287/peerj.preprints.1157v1/supp-4

Simple example of the reasoning behind differential abundance simulations

(a) In actual OTU tables generated from sequencing data, the counts (left column) are already compositional and therefore only relative (left column). Application of the ‘effect size’ to the original ‘Multinomial’ template to create fold-change differences disturbs the distinction between true positive (TP) and true negative (TN) OTUs in the ‘Original’ simulation, but not the ‘Balanced’ simulation. (c) Creation of a ‘Compositional’ OTU table from the ‘Multinomial’ template, where the counts/relative abundances are intentionally blurred for the TN OTUs.

DOI: 10.7287/peerj.preprints.1157v1/supp-5

Differential abundance detection performance where one sample group average library size is 3 times the size of the other

Labels are the same as in Fig. 4.

DOI: 10.7287/peerj.preprints.1157v1/supp-6

Differential abundance detection performance when the dataset is compositional

25% of OTUs are differentially abundant. Labels the same as in Fig. 4.

DOI: 10.7287/peerj.preprints.1157v1/supp-7

Supplemental R files

'Balanced' and 'Compositional' differential abundance simulations, as referenced in the Methods and Fig. S5.

DOI: 10.7287/peerj.preprints.1157v1/supp-8

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Sophie J Weiss conceived and designed the experiments, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Zhenjiang Xu conceived and designed the experiments, prepared figures and/or tables, reviewed drafts of the paper.

Amnon Amir conceived and designed the experiments, prepared figures and/or tables, reviewed drafts of the paper.

Shyamal Peddada conceived and designed the experiments, prepared figures and/or tables, reviewed drafts of the paper.

Kyle Bittinger conceived and designed the experiments, prepared figures and/or tables, reviewed drafts of the paper.

Antonio Gonzalez conceived and designed the experiments, contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Catherine Lozupone reviewed drafts of the paper.

Jesse R. Zaneveld conceived and designed the experiments, reviewed drafts of the paper.

Yoshiki Vazquez-Baeza contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Amanda Birmingham reviewed drafts of the paper.

Rob Knight conceived and designed the experiments, reviewed drafts of the paper.

Funding

S.J.W. was funded by the National Human Genome Research Institute Grant# 3 R01 HG004872-03S2, and the National Institute of Health Grant# 5 U01 HG004866-04. Research of S.D.P. was supported by the Intramural Research Program of the National Institute of Environmental Health Sciences, NIH (Z01 ES101744-04). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
 
By posting this you agree to PeerJ's commenting policies
22 Citations   Views   Downloads