To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
thanks for the corrections, it is a great paper!!
This is a very well done study, however, please follow both reviewers recommendations.
This paper is clearly written and well organized. The introduction and background are reasonable given the premise of the paper. Figures and tables are comprehensive and helpful.
The paper generates the following kinds of data:
1) Bacterial isolates from a marine environment
2) Microbial materials collected on filters from coral and marine environments
3) Sequencing reads from bacterial isolates
4) Sequencing reads from metagenomic samples
5) Assemblies of microbial genomes
6) Custom scripts
As the reviewer I am unable to verify that the assemblies or read data are going to be publicly available. There is a note indicating that the data is being submitted to Genbank but it was not clear what data (reads, assemblies) was submitted.
There is no explicit mention of whether the isolates or filters have been saved or would be available for other researchers. Particularly for the isolates this would be important to mention.
In fairness the scripts are a minor concern but they do not seem to be available. In keeping with true open access these scripts, with a short description of there usage context, should also be made available.
Finally, I would suggest explicitly indicating how much data was generated (reads, base pairs). Was any quality trimming done? It wasn't indicated in the text.
In general the experimental design was excellent and clearly written. Some minor changes, additions, modifications would be suggested as follows:
1) pg 6 line 159-160: It is somewhat ambiguous to say something is selected at random based on color or size. This should be clarified.
2) Pg 8 ln 198-204: Inconsistent referencing of languages/libraries. PERL language is referenced but scripts unavailable while R and Python languages not referenced but libraries are referenced and available.
3) Pg 8 ln 210-214: What operating system/file system was being used? Was this system in a RAID array?
4) Pg 9 ln 255: This reference is incorrect. The paper is in preparation and should not be cited with a publication date unless it has been accepted.
5) Pg 10 ln 274: Python script not available.
6) Pg 11 ln 303: Dinsdale reference not correctly formatted. It is not clear what packages/libraries or tools were used to carry out these statistical analyses. These should be directly referenced or it should be clear from the text that the reference given reference is the primary references for these tools.
One over arching issue with the methods is that it is not always clear whether specific aspects of the work took place on site on the research vessel or at a later time. Since this is the primary goal of the paper (i.e. to demonstrate the ability to do real time on site metagenomics) this is critical to the paper. Additionally it would be good to indicate how long these steps took to carry out since time may be a real factor and would help other researchers plan similar expeditions or scale their effort appropriately.
The results are reasonable given the experiements.
Interesting paper definitely addresses a need of the scientific community. I was surprised not to see more suggestions on how to handle the file corruption issues. Using md5 checksums is fine for validation but use of RAID disk arrays or use of solid state hard drives could probably solve most of these problems.
The paper presented intends to show the advantages of in situ Next Generation Sequencing (NGS) for remote locations. The work here presents 26 marine microbial genome, and two metagenomes. On board sequencing could be interesting though it presented some technical difficulties and its not clear what would be the advantage of real time processing against deep freezing and sending the samples to the sequencing facilities.
The experimental design is the result of an expedition to the Southern Line Islands, with samples collected either from coral or algal-surfaces. The nature of this work is exploratory and descriptive, and a proof of concept of field NGS, which overrides the lack of in depth analysis of the (meta)genomic data as well as linking the sequencing data with the phenotype testing, only shown for the serine utilization experiments. Although some of the concepts and methods for making this possible should be clarified prior to the publication. The observations and concerns are stated in the Comments for the Author section.
The results of NGS field sequencing are promising, the capability to sequence and analyze large datasets without Internet access as well as servers, HPC clusters or cloud services is interesting. This kind of techniques could be interesting and useful for research groups with no access to large computing infrastructure. And some of the scripts and methodology used should be disclosed in order to test results reproducibility.
P. 7 ~ L 170 What is an appropriate scale of B3 lysis?
P.8 ~ L 207 What is modified from the Ion Torrent pipeline? Is there a chance to document this? I only noticed the MD5 checksum, and the crop into four quadrants. Is there anything else?
P.8 ~ L 226 Could you share your custom Perl script, like in figshare?
P.9 ~ L 235 Do you think you could share your annotation pipeline scripts? This would be of interest for the whole community trying to annotate on locations with poor internet connections or limited computing resources.
P.9. L 249 Did you perform comparisons of your custom annotation against the standard RAST pipeline? This should be included into a summary of results.
P.10 ~ L 270 The e-value is dependent on the database size, could you please state what is the effective database size.
P.10 ~ L 279 Would you share your python script?
P. 17 ~ L 467 The analysis did not demonstrated as stated, it only suggest. Replace demonstrated with suggests.
P. 17 ~ L 468 Change identified their ability to have the predicted potential.
P. 18 ~ L 500 Could you state where you remove centrifugation steps from your procedure in methods?
Is there any chance to compare the results from in situ to frozen samples processed with the regular DNA extraction/sequencing protocols?
P.18 ~ L 508 Could you please describe what where the steps of the reverse engeneering? You were so lucky to have such a hacker on board!
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.