Visitors   Views   Downloads

Optimizing de novo genome assembly from PCR-amplified metagenomes

View preprint
RT @BejaLab: Assembly of PCR-amplified metagenomes https://t.co/EFW8rAocI3 via @PeerJPreprints
RT @Prof_Braj_Singh: Assembly of PCR-amplified metagenomes https://t.co/uxS9wTY3sw via @PeerJPreprints
Assembly of PCR-amplified metagenomes https://t.co/uxS9wTY3sw via @PeerJPreprints
RT @BejaLab: Assembly of PCR-amplified metagenomes https://t.co/EFW8rAocI3 via @PeerJPreprints
RT @BejaLab: Assembly of PCR-amplified metagenomes https://t.co/EFW8rAocI3 via @PeerJPreprints
RT @BejaLab: Assembly of PCR-amplified metagenomes https://t.co/EFW8rAocI3 via @PeerJPreprints
RT @BejaLab: Assembly of PCR-amplified metagenomes https://t.co/EFW8rAocI3 via @PeerJPreprints
RT @BejaLab: Assembly of PCR-amplified metagenomes https://t.co/EFW8rAocI3 via @PeerJPreprints
7 days ago
Assembly of PCR-amplified metagenomes https://t.co/EFW8rAocI3 via @PeerJPreprints
RT @gtrubl: Check out our new preprint "Optimizing de novo genome assembly from PCR-amplified metagenomes" published today in @PeerJPreprin…
PREPRINT: Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/sF530QrOcD — via @thePeerJ
18 days ago
RT @yokadzaki: Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/KBs48PYo9D
18 days ago
RT @yokadzaki: Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/KBs48PYo9D
RT @yokadzaki: Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/KBs48PYo9D
Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/KBs48PYo9D
19 days ago
RT @BugsInYourGuts: Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/m3r505mVKa
Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/m3r505mVKa
Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/PobPUUR9ph https://t.co/RZi8YHKlr9
Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/H3sQAHW0Jg
RT @gtrubl: Check out our new preprint "Optimizing de novo genome assembly from PCR-amplified metagenomes" published today in @PeerJPreprin…
22 days ago
RT @gtrubl: Check out our new preprint "Optimizing de novo genome assembly from PCR-amplified metagenomes" published today in @PeerJPreprin…
Optimizing de novo genome assembly from PCR-amplified metagenomes https://t.co/oFNcNJfIik
Check out our new preprint "Optimizing de novo genome assembly from PCR-amplified metagenomes" published today in @PeerJPreprints. @doescience @doe_isogenie @jgi @simroux_virus @Gloeomargarita @Lab_Sullivan #ComputationalBiology #Microbiology https://t.co/TA0dq2fSWl
NOT PEER-REVIEWED
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

Supplemental Information

PCR-amplified metagenomes are quantitative but include a significant amount of duplicated reads

A. Comparison of depth of coverage between unamplified (TruSeq, x-axis) and PCR-amplified (Nextera XT or Accel-NGS 1S Plus, y-axis) libraries. The average depth of coverage was computed for each contig as the average read depth normalized by the total size of the library. The 1:1 equivalence is indicated with a black line, while a linear best fit is shown in blue. For clarity, only 1,000 contigs randomly selected from each sample are plotted. Contigs with no reads mapped in the PCR-amplified library were not included. To be able to directly compare the two plots, only samples for which both a Nextera XT and 1S Plus libraries were available are included (Table S1). The subpanels show the correlation coefficient (Pearson and Spearman) of a sample-by-sample correlation between depth of coverage in unamplified and PCR-amplified libraries, either for all contigs or only for contigs ≥ 10kb with a depth of coverage ≥ 10x. B. Percentage of duplicated reads (y-axis) as a function of the number of PCR cycles performed during library creation (x-axis). Underlying data are availabe in Table S1.

DOI: 10.7287/peerj.preprints.27453v1/supp-1

Insert size and GC content distribution for all vs high-depth regions

A & B. Distribution of insert size for all regions (green) or only regions with high depth of coverage (orange) across PCR-amplified libraries. In panel A, all insert sizes were centered around 500bp to enable a more direct comparison between libraries. Panel B shows the same data without this transformation (i.e. raw insert size). C & D. Distribution of GC % for all regions (green) or only regions with high depth of coverage (orange). For panel C, each library GC% was centered around 50%, while panel D shows the same data without this transformation.

DOI: 10.7287/peerj.preprints.27453v1/supp-2

Assembly size and estimated error rates for different assembly pipelines

Comparison of the output of different assembly pipelines applied to PCR-amplified libraries. Panels A & B show the cumulative length of all contigs (A) or contigs ≥ 10kb (B) across assembly pipelines (x-axis). Panel C displays the cumulative length of contigs ≥ 10kb relative to the largest value for each library, i.e. as a percentage of the “best” assembly for this library (“best” being defined as the largest cumulative length of contigs ≥ 10kb). Panel D displays the distribution of estimated error rates across the different assembly pipelines, for the 25 libraries for which error rates could be estimated (Table S2 & S3). Norm.: Normalization, Dedup.: Deduplication, Meta: metaSPAdes, SC: single-cell SPAdes.

DOI: 10.7287/peerj.preprints.27453v1/supp-3

Description of samples and libraries analyzed

The first tab lists information about individual samples including the list of all libraries generated for each sample, and the second tab includes information about each library.

DOI: 10.7287/peerj.preprints.27453v1/supp-4

Samples including both unamplified and PCR-amplified libraries

List of the 25 PCR-amplified for which an unamplified dataset was available, alongside specific metrics that could be calculated using the unamplified dataset as reference, i.e. correlation of average depth of coverage of contigs, and percentage of contigs from the unamplified assembly detected in the PCR-amplified library. A contig was considered as detected if ≥ 1 read(s) from the PCR-amplified library mapped to it.

DOI: 10.7287/peerj.preprints.27453v1/supp-5

Results from the different assembly pipelines tested

The first tab lists the different steps and tools tested. The second tab includes the results of de novo genome assembly with the different pipelines for each PCR-amplified library. For the 25 PCR-amplified libraries for which an unamplified reference was available, this second tab also includes estimates of assembly errors for each assembly pipeline obtained with QUAST.

DOI: 10.7287/peerj.preprints.27453v1/supp-6

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Simon Roux conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.

Gareth Trubl conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

Danielle Goudeau performed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

Nandita Nath performed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

Estelle Couradeau performed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

Nathan A Ahlgren performed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

Yuanchao Zhan performed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

David Marsan performed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

Feng Chen performed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.

Jed A Fuhrman authored or reviewed drafts of the paper, approved the final draft.

Trent R Northen authored or reviewed drafts of the paper, approved the final draft.

Matthew B Sullivan authored or reviewed drafts of the paper, approved the final draft.

Virginia I Rich authored or reviewed drafts of the paper, approved the final draft.

Rex R Malmstrom conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.

Emiley A Eloe-Fadrosh conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

Data are available through the JGI genome portal (https://genome.jgi.doe.gov/portal/). Accession numbers are listed in Table S1.

Data Deposition

The following information was supplied regarding data availability:

Data are available through the JGI genome portal (https://genome.jgi.doe.gov/portal/). Accession numbers are listed in Table S1.

No new code was generated, all the analyses were conducted using publicly available tools listed in the Methods section.

Funding

Delaware Bay samples (YZ, DM, FC) were collected through a research cruise supported by a National Science Foundation grant (OCE-0825468). Sampling and extraction of thawing permafrost soil samples from Stordalen Mire (VIR, GT, MBS), was funded by the Genomic Science Program of the United States Department of Energy Office of Biological and Environmental Research, grants DE-SC0004632, DE-SC0010580, and DE-SC0016440, which also supported GT. This work was supported by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists, Office of Science Graduate Student Research (SCGSR) program. The SCGSR program is administered by the Oak Ridge Institute for Science and Education (ORISE) for the DOE. ORISE is managed by ORAU under contract number DE‑SC0014664. This work was also supported by Gordon & Betty Moore Foundation grants 3790 and 5488 to MBS, and the US Department of Energy Office of Science, Office of Biological and Environmental Research Early Career Program under contract number and DE-AC02-05CH11231 to TRN. The work conducted by the U.S. Department of Energy Joint Genome Institute is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-05CH11231. The funders had no role in the design of the study, the collection, analysis, and interpretation of data, and in writing the manuscript. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
 
By posting this you agree to PeerJ's commenting policies