To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
Thank you for the change. This clears the way for publication.
Thank you for your resubmission. I had a look at your point-by-point responses and changes in the manuscript. I am satisfied with them. In particular, I thought that the additional plots you provided in support of your approach to count the number of genomes were helpful. However, I feel that you should add them to the paper, as readers may have the same question as Reviewer #1.
Could you please do this small change? This will clear the way for publication.
The reviewers are generally supportive, but also make a few pertinent points that need to be addressed before publication. In particular, both mention the need for more details on anvi'o. Given that the paper does not introduce a that method, I however don't think that a comparison with another pipeline is necessary.
Also, please provide more methodological details on the RNA-seq data (sample prep, etc).
The authors make use of a previously published software pipeline for metagenomic analysis together with RNA-seq data to quality check and improve genome assembly, using as their test case the tardigrade, concluding that contamination, particularly deriving from use of some technologies, underlie most of its previously suggested high prevalence of bacterial-to-eukaryote HGT. Basic approach of MG community analysis method to validate purity of genome sequencing assembly is elegant and needs to be published, and the result that RNA-seq invalidates many of the putative HGTs in the first published tardigrade genome is also valuable, as is the finding of a putative symbiont and the highlighting that a particular library preparation method may be prone to amplifying contaminants that current foreign DNA filters have difficulties identifying.
See change requests below in comments to author section.
This is fine, but the use of the software must be made clearer so that this can be easily assessed by the reader.
This is fine.
- I don't see anywhere where the RNA-seq data has been deposited, nor where the new corrected genome has been deposited.
- Abstract and elsewhere claims anvi'o is routinely used. Please substantiate this - I had never heard of it before.
- lines 45-50 outline one type of metagenomic workflow but misses the alternative approach used e.g. in large-scale human gut studies; building gene catalogs/genome collections, then mapping reads directly to them. This also should be at least mentioned as a binning-free approach, even though it is not directly relevant to the work at hand.
- Methods generally: even if anvi'o has been published, few probably know the details of how it functions. For this reason methods need to be clearer on what exactly ends up being done by choosing particular options in the software, in order to understand the nature of the results. For the paper to be self-contained, there needs to be more background on what the software does.
-- For example, it is not clear how the estimation of number of genomes is done. If there are 139 genes semi-universally single-copy, and HMMER is used to find their matches in samples, how does that yield number of distinct genomes? It would e.g. if the hits were then clustered (or the queries first clustered somehow), so as to check how many clusters there are per gene family, but the section does not state. This needs to be made clear.
-- For classifications of genome bins likewise: are all reads in each bin uploaded to RAST and assessed, or is there some redundancy removal first?
The paper appears to me well written, complete and easy understandable.
The detection of contamination in reference genomes or more broadly in de novo assemblies is still a huge problem. There have been methods suggested like Kraken to perform such a search, but it appears to me that the field is still very young.
The here presented work aims at two things. First, it reports an improved de novo assembly for Hypsibius dujardini by combining previous published data. Second, and maybe more importantly, it describes a possibility on how to detect contamination. While the methods are well described I personally have a bit of a problem to identify the main massage of the paper.
The abstract and also the introduction reads like that the authors present an improved genome assembly of Hypsibius dujardini. Later on, the authors focuses on the methods to detect and assess the contamination. The conclusion, however, focuses on de novo assembling of bacterial genomes out of metagenomics data. Thus, I would recommend to make it clearer. I think the main contribution to the field might be the summary of the methods the authors used to detect the contamination. Furthermore, it would be nice to see how the methods used in this paper perform in comparison to other. Thus, I would suggest to run the obtained and previous assemblies through methods like Kraken or other methagenomic analysis methods. This was actually suggested in an article (Merchant et al., 2014) to detect bacterial contamination.
The findings and the way they are obtained are robust and controlled.
I enjoyed reading your paper. I would encourage you to focus more on the methods that you described and compare them to existing approaches to detect contamination.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.