All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Thank you very much for addressing all reviewer concerns!
# PeerJ Staff Note - this decision was reviewed and approved by Keith Crandall, a PeerJ Section Editor covering this Section #
The authors have revised the manuscript to my satisfaction.
Dear Simon, Emiley, and others,
Thank you very much for your patience. We now have evaluations of your study from three referees who are experts of molecular biology, environmental and genome-resolved metagenomics, and high-throughput sequencing surveys of low-biomass environments.
First of all, I would like to thank Dr. Baker, Dr. Morrison, and the Anonymous Reviewer #3 reviewers for graciously taking the time from their own research to share their opinions to improve this study.
As you will see in the reports attached, all reviewers agree that the manuscript is well-written, clear, and easy to follow. Reviewers also agree that the analyses are satisfactory, well-described, and support the claims. I also agree with the reviewers that the study provides essential insights that will be helpful for many investigators to make informed decisions. The report also includes some minor suggestions by Dr. Morrison (such as the in-house scripts that are not publicly available, incomplete information in the Supplementary Table 1, and others) as well as major suggestions from the Reviewer #3 (such as the unanswered question "to what extent biases associated with library preparation affect the quality of genomes recovered", gene-level implications of these biases, and others). I hope you will consider all concerns raised by the reviewers carefully.
I would also like to echo the point raised by the last two reviewers regarding the availability of raw sequencing data through a service that supports anonymous bulk downloads such as NCBI's Short Read Archive (at the time of writing these lines https://genome.jgi.doe.gov/ was still offline, however, the suggestion for SRA is not related to the status of that particular server).
There was no consensus from the reviewers regarding the decision (the suggestions were "Accept", "Minor Revision", and "Major Revision"). My decision of Major Revision is simply to make sure you will not be pressured by a fast-approaching deadline, but indeed an early resubmission that satisfies reviewers is more than welcome.
Thank you very much for your patience again, and we are looking forward to reading the revised version and the response.
The manuscript is very well written. The figures and tables are easy to follow and all of the data are publicly available on JGI IMG.
The research background and objectives are well defined. The methods have been described in detail. The molecular and computational approaches are state of the art.
Roux et al. used a large number of PCR-amplified metagenomic datasets to optimize genomic assembly pipelines for low biomass samples. I think that this will be really useful for deciding on ways to improve metagenomic assemblies of low biomass samples. It does a great job and examining the effects of amplification on the libraries and benchmarking computational methods to mitigate these effects to improve assembly.
The article is clearly written. Sufficient relevant background information is provided and appropriately cited. The article's organization is standard. I appreciate that the figures included in the article are relatively simple, but complete results are available in the supplemental files. The table and figure legends are complete. Figure quality provided for the review is sufficient and I assume there will be no degradation of quality in the published version.
Data are available through JGI and contigs through an NERSC download site. This seems adequate, although I personally go to the SRA for read datasets.
The question of how PCR amplification of metagenomic libraries influences the quality of assembly is important and worth multiple studies, given the plethora of error correction, filtering, trimming, and assembly tools available. The authors, several of whom are at JGI, make use of the bbtools suite, which is reasonable; the study was not meant to be an exhaustive review of methods. Rather, there were three specific steps tested, in a total of 12 different combinations, on 169 PCR-amplified metagenomes from 127 different samples. The methods are adequately described so that the work is reproducible, and more importantly, can be implemented with other datasets. There is one exception to this; custom perl scripts are mentioned, but not included in the supplemental materials. Please consider doing this, or providing a GitHub link.
The results and their interpretation are sound and meaningful. The authors suggest that one of the 12 pipelines can serve as a default first approach, and this is justified by the results as a whole. They also clearly describe the extent of the improvement under different numbers of amplification cycles. This may lead to stronger efforts by PIs to keep amplification to a minimum.
A very useful, timely analysis that will be of interest to a wide community.
Supplemental Table 1 is unfinished "Ask Christa to release on Portal". Please make sure these asides are addressed and removed.
Lines 173-174: Why expect human contamination? Is this a standard JGI step?
Line 182: I believe that Table S3 is referenced before Table S2; reorder numbering if so.
Lines 221-226: I don't follow the reasoning behind ignoring translocation errors--how can both assemblies be correct in light of the diagram in the QUAST manual (http://quast.bioinf.spbau.ru/manual.html#sec3.1.2). I can see that it might be unlikely to tell which of the two is correct...could you clarify?
Line 223: somewhere earlier than line 422, you need to define what you mean by 'best" assembly, e.g., largest combined assembly in contigs over 10kbp. 'Standard' is defined at line 214 and 'alternative' at 313. Might be helpful to have them in one place, or come up with a label other than 'alternative', since you start with 11 alternatives.
Lines 336-339: Oddly worded--"the amount of additional errors remains lower than the additional number and size of long contigs..."? I think you mean that the relative increase of assembly errors (2x) seems an acceptable trade off for the improved assembly metrics (24x) or something similar. Since this is one of the few caveats to the alternative method, it should be explained quite clearly.
Various places: 'cohen' or 'Cohen'?
The overall outcome of this paper is to demonstrate that metagenome assemblies from PCR amplification dependent library preparation results in highly uneven coverage across assembled contigs as compared to PCR-free library preparation protocols. Further, the authors report that the uneven coverage of de novo assembled contigs for PCR dependent libraries is primarily a result of selective amplification of short insert size amplicons and not impacted by other amplicon characteristics (e.g. GC content). The manuscript further tests the effect of read correction (i.e., strict vs related), read selection (i.e., deduplication vs none), and assembly approach (i.e., metaSpades vs single-cell Spades) to determine the combination that may allow for improved de novo assembly. The authors conclude that relaxed read correction with deduplication followed by the use of single-cell Spades resulted in improved assembly with significantly higher proportion bases included in contigs > 10kbp as compared to strict read correction, no deduplication and use of MetaSpades. Further, the authors compare de novo assembly from a select few samples that were generated with from PCR free and PCR amplification based methods and demonstrate that efforts to improve assembly length can be associated with a moderate increase in miss-assemblies.
The paper is well written and easy to follow, experimental plan is well designed and reported and the overall conclusions presented are supported by the data and associated analyses.
Below are some important points for the authors to consider:
1. The primary assembly metric driving the experiments is increased length of metagenomic assembly, which will presumably aid in the genome binning process. While the authors do also consider the issue of miss-assembly, they do not report how the choice of PCR cycles, read correction, deduplication, and S-C Spades affects the genome binning process and the quality of recovered metagenome assembled genomes (MAGs). This should be straightforward to do for the Lake Mendota samples and compare how the different de novo assembly pipelines tested affect MAG quantity and quality for PCR amplified metagenomes as compared to the PCR free method. This might also involve some experimentation with binning approaches. I feel that this extra effort would add significant value to the manuscript overall and would be of significant benefit to the research community.
2. While assessing the assembly errors, the authors focus on comparing assemblies resulting from various data processing steps for PCR amplified metagenomes to that of the PCR free metagenomes. I appreciate this is an important step, but what also matters is what are the implications of PCR cycles, relaxed read correction, deduplication, and S-C Spades on annotation. For instance, how fragmented are genes post for the "best performing" assembly approach for PCR based metagenomes compared to the PCR free metagenomes? While this might not be the ideally suited for illumina only assembly, Mick Watson blogged about a nice approach to assess annotation quality in this blog (http://www.opiniomics.org/on-stuck-records-and-indel-errors-or-stop-publishing-bad-genomes/ ). May be a similar analysis to this might be of value to the paper. However, at the minimum the impact of the proposed approach to enhance metagenome assembly on gene annotation must be included.
3. It would be extremely helpful if the various scripts and command lines used for evaluating assembly quality, coverage bias analysis are available. For instance, there are at least a couple mentions of “custom perl scripts” in the data analysis portion and would be helpful to release this. If it is not too onerous, it would be even more helpful if the authors could deposit the scripts, command lines used for the entire workflow on github project dedicated to this manuscript.
4. I tried accessing the results from different assembly pipelines at this url: http://portal.nersc.gov/dna/microbial/prokpubs/BenchmarksPCRMetagenomes/ but could not access them (i.e., site not available).
5. URL’s for some the raw data (samples: CGHWP, CGUX) are not provided. All the data is on https://genome.jgi.doe.gov/ but for some reason I was unable to connect to the JGI genome portal. I tried accessing this URL and data from a couple different computers (incase of firewall etc issues) but was not able to do so. Could the authors please look into this.
6. It is possible that while all the data will be made available via urls to the jgi portal, will readers be able to download it in bulk? It would be very helpful if the raw data and assemblies were made available via NCBI.
7. Please also provide additional information on the samples, such as the DNA concentration prior to library preparation.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.