Sequence composition diversity in Alaskan glacier and other metagenomes

Department of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
Department of Environmental Science, Alaska Pacific University, Anchorage, AK, USA
Waksman Genomics Core Facility, Rutgers University, Piscataway, NJ, USA
DOI
10.7287/peerj.preprints.734v1
Subject Areas
Bioinformatics, Computational Biology, Environmental Sciences, Genomics, Microbiology
Keywords
Sequence composition, composition diversity, metagenomics
Copyright
© 2014 Choudhari et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Choudhari S, Dial RJ, Kumar D, Shain DH, Grigoriev A. 2014. Sequence composition diversity in Alaskan glacier and other metagenomes. PeerJ PrePrints 2:e734v1

Abstract

Metagenomics by next generation sequencing has become an important tool for interrogating complex microbial communities. In this study we analyzed several pairs of metagenomic samples obtained by different methods and observed biases, resulting in different nucleotide composition of the sequenced reads. The pairwise sample comparison was based on the principal component analysis of dinucleotide word frequencies in sequences obtained from different platforms. We found bias in the sequences obtained from the different platforms for the amplified hypervariable regions in 16S rRNA but not in shotgun metagenome reads aligned to such hypervariable regions. The differences and consistency of the distributions of the nucleotides suggest that the biases are likely due to a combination of biases introduced by PCR and different sequencing protocols, and they are related to the GC content of the reads produced. For this reason, caution should be exercised when interpreting the results of comparative metagenomics studies, as they may vary depending on the sequencing technology.

Author Comment

This is version 1 of a submission to PeerJ for review.