Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products

Christopher W. Beitel; Lutz Froenicke; Jenna M. Lang; Ian F. Korf; Richard W. Michelmore; Jonathan A. Eisen; Aaron E. Darling

doi:10.7287/peerj.preprints.260v1

Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products

Christopher W. Beitel ¹, Lutz Froenicke¹, Jenna M. Lang¹, Ian F. Korf^1,2, Richard W. Michelmore^1,2,3,4, Jonathan A. Eisen^1,5, Aaron E. Darling⁶

1 UC Davis Genome Center, University of California, Davis, Davis, California, United States

2 Department of Molecular and Cellular Biology, University of California, Davis, Davis, California, United States

3 Department of Plant Sciences, University of California, Davis, CA, USA

4 Department of Medical Microbiology and Immunology, University of California, Davis, Davis, California, United States

5 Department of Evolution and Ecology, University of California, Davis, Davis, California, United States

6 ithree institute, University of Technology Sydney, Sydney, NSW, Australia

DOI: 10.7287/peerj.preprints.260v1

Published: 2014-02-28
Accepted: 2014-02-28

Subject Areas: Bioengineering, Bioinformatics, Computational Biology, Genomics, Microbiology
Keywords: Hi-C, microbial ecology, metagenomics, plasmids, synthetic microbial communities, Markov Clustering, metagenome assembly, strain differentiation, haplotype phasing, genome scaffolding

Copyright: © 2014 Beitel et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, Darling AE. 2014. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ PrePrints 2:e260v1 https://doi.org/10.7287/peerj.preprints.260v1

Abstract

Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of “binning” the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a synthetic metagenome sample to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species. The Hi-C data also reliably associated plasmids with the chromosomes of their host and with each other. We further demonstrated that Hi-C data provides a long-range signal of strain-specific genotypes, indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This metagenomic Hi-C method could facilitate future studies of the fine-scale population structure of microbes, as well as studies of how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities. The method is not limited to microbiology; the genetic architecture of other heterogeneous populations of cells could also be studied with this technique.

Supplemental Information

Figure 1. Hi-C insert distribution.

The distribution of genomic distances between Hi-C read pairs is shown for read pairs mapping to each chromosome. For each read pair the minimum path length on the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded. The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and plotted.

Supplemental Information

Figure 1. Hi-C insert distribution.

Figure 2. Metagenomic Hi-C associations.

Figure 3. Contigs associated by Hi-C reads.

Figure 4. Hi-C contact maps for replicons of Lactobacillus brevis.

Figure 5. Relationship of distance to degree of separation in Hi-C and mate-pair variant graphs.

Supplementary Figure 1. Illustration of the signal provided by Hi-C for metagenome binning

Supplementary Figure 2. Visualization of the impact of parameter choice on the quality of clustering solutions.

Supplementary Figure 3. Hi-C contact frequency within L. brevis genome.

Supplementary Figure 4. Hi-C contact map for Lactobacillus brevis plasmid 1.

Supplementary Figure 5. Hi-C contact map for Lactobacillus brevis plasmid 2.

Supplementary Figure 6. Hi-C contact frequency within P. pentosaceus genome.

Supplementary Figure 7. Variant graph illustration.

Table 1. Species alignment fractions.

Table 2. Markov clustering of metagenome assembly contigs using Hi-C data.

Table 3. Variant graph statistics.

Supplementary Table 1. SOAPdenovo assembly results.

Supplementary Table 2. Species alignment fractions (expanded table).

Supplementary Table 3. Raw metagenomic Hi-C association counts.

Supplementary Table 4. Normalized association counts.