NOT PEER-REVIEWED
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

A peer-reviewed article of this Preprint also exists.

View peer-reviewed version

Supplemental Information

Figure 1. Hi-C insert distribution.

The distribution of genomic distances between Hi-C read pairs is shown for read pairs mapping to each chromosome. For each read pair the minimum path length on the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded. The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and plotted.

DOI: 10.7287/peerj.preprints.260v1/supp-1

Figure 2. Metagenomic Hi-C associations.

The log-scaled, normalized number of Hi-C read pairs associating each genomic replicon in the synthetic community is shown as a heat map (see color scale, blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21.

DOI: 10.7287/peerj.preprints.260v1/supp-2

Figure 3. Contigs associated by Hi-C reads.

A graph is drawn with nodes depicting contigs and edges depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count thereof depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see legend) with node size reflecting contig size. Contigs below 5kb and edges with weights less than 5 were excluded. Contig associations were normalized for variation in contig size.

DOI: 10.7287/peerj.preprints.260v1/supp-3

Figure 4. Hi-C contact maps for replicons of Lactobacillus brevis.

Contact maps show the number of Hi-C read pairs associating each region of the L. brevis genome. The L. brevis chromosome (Lac0, a, Spearman rank correlation) and plasmids (Lac1, b; Lac2, c) show enrichment for local associations (bright diagonal band). Interactions between Lac1 and Lac0 (d) and Lac2 and Lac0 (e) are shown. All except Lac0 are log-scaled. Circularity of Lac0 became apparent after transforming data with the Spearman rank correlation (computed for each matrix element between the row and column sharing that element) in place of log transformation (a) indicated by the high number of contacts between the ends of the sequence. In all plots, pixels are sized to represent interactions between blocks sized at 1% of the interacting genomes. The number of HindIII restriction sites in each region of sequence is shown as a histogram on the left and top of each panel.

DOI: 10.7287/peerj.preprints.260v1/supp-4

Figure 5. Relationship of distance to degree of separation in Hi-C and mate-pair variant graphs.

The length of paths between random pairs of SNP sites in a SNP graph constructed from both Hi-C and mate-pair libraries of varying sizes (left; 5 kb, 10 kb, 20 kb, 40 kb), smoothed using locally-weighted regression.

DOI: 10.7287/peerj.preprints.260v1/supp-5

Supplementary Figure 1. Illustration of the signal provided by Hi-C for metagenome binning

Two bacterial cells are illustrated, each containing a single circular chromosome. For one genomic region in each of the two species, examples of associations that are likely (green; red is “not likely”) to be derived from Hi-C are illustrated.

DOI: 10.7287/peerj.preprints.260v1/supp-6

Supplementary Figure 2. Visualization of the impact of parameter choice on the quality of clustering solutions.

A small-multiples plot is showing 5x5 combinations of contact minimum (top to bottom; 0, 3, 5, 7, 9) and contig size minimum (left to right; 1,000, 8,000, 15,000, 22,000, 29,000) thresholds. For each parameter combination, line plots show the quality (y-axis) of clustering solutions performed for inflation values in the interval [1,2]. The quality of clustering solutions is measured in terms their true-positive rate (red), false-positive rate (green), positive predictive value (blue), and negative predictive value (black) are shown.

DOI: 10.7287/peerj.preprints.260v1/supp-7

Supplementary Figure 3. Hi-C contact frequency within L. brevis genome.

Contact frequency is visualized as a heat map, after normalization and application of the spearman rank correlation (matrix elements are the spearman correlation of the row and column of which they are the intersection). Circularity is apparent in the elevated contact between either end of the reference assembly sequence.

DOI: 10.7287/peerj.preprints.260v1/supp-8

Supplementary Figure 4. Hi-C contact map for Lactobacillus brevis plasmid 1.

Contact maps show the number of Hi-C read pairs associating each region of the L. brevis plasmid 1. Contact values are Spearman rank correlation transformed following normalization. Pixels are sized to represent interactions between blocks sized at 1% of the interacting sequence. A minimal signal of circularity is apparent with enrichment for contact between the minimum and maximum positions within the reference assembly.

DOI: 10.7287/peerj.preprints.260v1/supp-9

Supplementary Figure 5. Hi-C contact map for Lactobacillus brevis plasmid 2.

Contact maps show the number of Hi-C read pairs associating each region of the L. brevis plasmid 2. Contact values are Spearman rank correlation transformed following normalization. Pixels are sized to represent interactions between blocks sized at 1% of the interacting sequence. A signal indicative of circularity is not apparent.

DOI: 10.7287/peerj.preprints.260v1/supp-10

Supplementary Figure 6. Hi-C contact frequency within P. pentosaceus genome.

Contact frequency is visualized as a heat map, after normalization and application of the spearman rank correlation (matrix elements are the spearman correlation of the row and column of which they are the intersection). Circularity is apparent in the elevated contact between either end of the reference assembly sequence.

DOI: 10.7287/peerj.preprints.260v1/supp-11

Supplementary Figure 7. Variant graph illustration.

Two examples of variant graphs (non-data illustration). Variant nodes (circles) are linked by edges (light grey lines) derived from read pair data with small and medium (Graph I) or small, medium, and large (Graph 2) inserts. A path between two nodes (start, end) is illustrated and this path is shorter in the graph representing the dataset that includes larger-insert reads.

DOI: 10.7287/peerj.preprints.260v1/supp-12

Table 1. Species alignment fractions.

The number of reads aligning to each replicon present in the synthetic microbial community are shown before and after filtering, along with the percent of total constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon, species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21.

DOI: 10.7287/peerj.preprints.260v1/supp-13

Table 2. Markov clustering of metagenome assembly contigs using Hi-C data.

A range of inflation parameters were applied, and the precision and recall for the resulting clusters was calculated as described in the text. An inflation parameter of 1.1 produced a near perfect clustering of contigs by species.

DOI: 10.7287/peerj.preprints.260v1/supp-14

Table 3. Variant graph statistics.

Connectivity statistics are shown for variant graphs constructed from various simulated mate-pair (# kb, MP) and Hi-C read datasets. Graph constructed from all Hi-C data are compared to those constructed using only Hi-C read pairs with inserts over 1 kb. The Hi-C variant graphs are highly connected in contrast to the mate-pair graphs that have both lower connectedness and lower rates of variants occurring in the same connected components.

DOI: 10.7287/peerj.preprints.260v1/supp-15

Supplementary Table 1. SOAPdenovo assembly results.

Statistics are shown for three assemblies, including the simulated coverage and the number of contigs (and scaffolds) present in the assembly. Assembly quality is reflected in the count of misassembled contigs and scaffolds (“contig error” and “scaffold error”). The percent of the total reference sequence size constituted by each assembly is also shown.

DOI: 10.7287/peerj.preprints.260v1/supp-16

Supplementary Table 2. Species alignment fractions (expanded table).

The number of reads aligning to each replicon present in the synthetic microbial community are shown before and after alignment filtering, along with the percent of total constituted by each species. The GC content and restriction site (R.S.) counts of each replicon, species, and strain are shown. Total and fractional raw alignment counts adjusted by R.S. counts are also shown, constituting our best approximation of relative abundances of synthetic community members. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21. <!--[if !supportAnnotations]--> <!--[endif]-->

DOI: 10.7287/peerj.preprints.260v1/supp-17

Supplementary Table 3. Raw metagenomic Hi-C association counts.

The number of Hi-C read pairs associating each genomic replicon in the mock community is shown without normalization.

DOI: 10.7287/peerj.preprints.260v1/supp-18

Supplementary Table 4. Normalized association counts.

Shown are the counts of Hi-C read pairs associating each pair of replicons included in the synthetic community, normalized as described in the methods.

DOI: 10.7287/peerj.preprints.260v1/supp-19

Additional Information

Competing Interests

Jonathan Eisen is an Academic Editor for PeerJ. We declare that we have no other competing interests.

Author Contributions

Christopher W. Beitel conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper, conceived the method.

Lutz Froenicke conceived and designed the experiments, performed the experiments, reviewed drafts of the paper, prepared Hi-C libraries.

Jenna M. Lang conceived and designed the experiments, performed the experiments, contributed reagents/materials/analysis tools, reviewed drafts of the paper, prepared the mixture.

Ian F. Korf conceived and designed the experiments, analyzed the data, reviewed drafts of the paper.

Richard W. Michelmore conceived and designed the experiments, contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Jonathan A. Eisen conceived and designed the experiments, contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Aaron E. Darling conceived and designed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper, conceived the method.

Grant Disclosures

The following grant information was disclosed by the authors:

HSHQDC-11-C-00091

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

Sequence Read Archives submission SRX377733

Funding

This work was supported by a gift from MARS, Inc. and by Department of Homeland Security contract #HSHQDC-11-C-00091. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
 
By posting this you agree to PeerJ's commenting policies
  Visitors   Views   Downloads