Biases in genome reconstruction from metagenomic data

View article
Bioinformatics and Genomics

Main article text

 

Introduction

Materials and Methods

Data and code availability

Identification of CR and NR regions

Compositional analysis

Repetitiveness analysis

Gene function analysis

Statistical analysis

Results and discussion

Nucleotide composition of NRs frequently differs from the genome average

Repeated sequences segregate aberrantly

Functional assessment of NR genes

Evaluation of a complex metagenomic data set and common automated binning tools

What’s missing from reconstructed genomes?

Conclusions

Supplemental Information

Contig coverage of CRs.

All contigs from the metagenome assembly set from which the MAGs were generated were searched against the CRs of each genome using NUCmer. The number of contigs mapping and the depth of coverage were determined for each CR and plotted by length.

DOI: 10.7717/peerj.10119/supp-1

Contig coverage of NRs.

Analysis was performed as described for Figure S1, but on the NRs of each genome.

DOI: 10.7717/peerj.10119/supp-2

Scaffold length versus tetranucleotide chi-squared statistic.

Comparison of tetranucleotide composition of CR and NR scaffolds as a function of scaffold length. Scaffolds from HL-46 serve as a reference standard because of their assumed random distribution. Logarithmic regressions and R2 values are presented.

DOI: 10.7717/peerj.10119/supp-3

Composition and repeat analysis of reference genomes.

Data is presented as described in Figure 1 legend.

DOI: 10.7717/peerj.10119/supp-4

Tara Oceans MAGs used in this study.

DOI: 10.7717/peerj.10119/supp-5

RefSeq genomes used in this study.

DOI: 10.7717/peerj.10119/supp-6

Mann-Whitney U test with Benjamini-Hochberg correction for compositional variance analysis.

DOI: 10.7717/peerj.10119/supp-7

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

William C. Nelson conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Benjamin J. Tully conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, and approved the final draft.

Jennifer M. Mobberley conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The raw metagenomic data used to construct the MAGs is available in the NCBI Sequence Read Archive: SRX1063989 and SRX1065184.

The Tara Oceans MAG data is available as described in Table S1, and in the original publications: Tully et al. (DOI 10.1038/sdata.2017.203), Parks et al. (DOI 10.1038/s41564-017-0012-7) and Delmont et al. (DOI 10.1038/s41564-018-0176-9).

The NCBI RefSeq genomes used in the analysis are available in Table S2.

All custom analysis scripts are available at https://github.com/wichne/biases_in_genome_reconstruction.

The Hot Lake unicyanobacterial consortia MAG and genome data analyzed are available in GenBank (Table 1).

Funding

William C. Nelson and Jennifer M. Mobberley were supported by the U.S. Department of Energy (DOE), Office of Biological and Environmental Research (BER), as part of BER’s Genomic Science Program (GSP). This contribution originates from the GSP Foundational Scientific Focus Area (FSFA) at the Pacific Northwest National Laboratory (PNNL). The Pacific Northwest National Laboratory is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RL01830. Sequence data presented was generated at the DOE Joint Genome Institute under contract no. DE-AC02-05CH11231 and Community Science Project 701. Benjamin J. Tully was funded through the Center for Dark Energy Biosphere Investigations (OCE-0939654). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

38 Citations 4,167 Views 922 Downloads