All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The authors have addressed all of the reviewers' comments. This manuscript is now ready for publication.
[# PeerJ Staff Note - this decision was reviewed and approved by Konstantinos Kormas, a PeerJ Section Editor covering this Section #]
There is one more minor concern that needs to be addressed before this manuscript can be accepted.
All basic requirements for this article have been addressed
The experimental design of this study has been significantly improved
The novelty of the work is now more clearly articulated, and the conclusions are well-stated and supported
I have one comment regarding the resistance mechanisms of the identified resistance genes. Please provide a detailed description of the resistance mechanisms associated with the abundant antibiotic resistance genes (ARGs).
The authors have addressed all my concerns.
The paper now has sufficient details.
The findings of the paper are well supported.
Your manuscript has now been reviewed by three external experts in the field. The reviewers think that your work is of potential interest; however, they also raised several concerns about your study. In particular, multiple reviewers have concerns about (1) the lack of details throughout the methods section, and (2) the over-reliance on referring to previous works, which makes reading difficult. Apart from these, please also (1) highlight the novelty of this work as compared with Han et al. (2022), (2) redo the network analysis taking into account the compositionality nature of microbiome data, and (3) release the codes used in the bioinformatics analysis.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
The article is generally well-written in professional English, but some sections need more clarity and detail. There are areas where technical descriptions are too brief, relying heavily on previous works without sufficient elaboration.
The article includes relevant literature references, but the introduction and background sections could be expanded to provide a broader context. There should be more detailed comparisons with previous studies to situate the findings within the existing body of knowledge.
The submission is self-contained and includes relevant results. However, some results need better contextualization and interpretation.
Line 79: The exact geographical coordinates of the location of Lake CL are missing. This is crucial for the replicability of the study and to geographically situate the study area. Specific coordinates for each sampling point within the western (WCL) and eastern (ECL) sections of the lake should also be included. The characteristics of the lake, such as its size, average depth, water quality, annual precipitation, etc., are not described. Briefly describing the lake's surroundings (e.g., proximity to urban areas, surrounding land use) can help contextualize the results. The depth at which the samples were taken is not specified. The method of transporting the samples from the sampling site to the laboratory is not mentioned. A justification for the selection of sampling points in the lake is missing. Explain the selection criteria, such as areas of higher flow or zones near sources of contamination.
Lines 84 to 90:
• The number of replicates for each measurement (if any were taken, including water samples) is not specified.
• The depths at which the samples were taken for physicochemical analyses are not mentioned.
Lines 91 to 100:
• It is not specified whether shotgun metagenomics or 16S rRNA metagenomics was used.
• Information about the DNA extraction kit used, including the name of the kit and the manufacturer, is missing. In many sections of the article, the descriptions of the methods are too brief, simply referring to previous authors. While this is acceptable, it is essential to clarify how the procedures were done, citing the authors, to provide greater clarity to your work. This also allows readers to understand your work without needing to refer to other articles, making it easier to read.
• It is not mentioned whether tools like Nanodrop or Qubit were used to quantify and evaluate the quality of the extracted DNA. Describing these steps is important to ensure the quality of the DNA before sequencing.
• Information about the sequencing equipment used (e.g., Illumina, Oxford Nanopore) and the sequencing conditions, such as read length and specific platform, as well as any sequencing kits or controls that might have been used, is not provided.
• Details about the conditions of the sequencing library, such as the library preparation protocol and sequence size, are necessary.
• The taxonomic analysis process is not detailed beyond mentioning MetaPhlan3 and MetaPhlAn4. Including information about the parameters and software versions used for each step of the analysis would be useful (for all software).
• The approach for gene prediction and functional annotation beyond Prodigal is not mentioned. Including details about the databases and functional annotation methods used can provide a more comprehensive view of the analysis performed. Which database did you use for antibiotic resistance genes? From the results obtained, did you conduct a gene-by-gene search, with the results in an Excel file? This part must be well-detailed in your document.
Lines 102-109:
• Information on how the data was prepared and normalized before statistical analysis is missing. What normalization method did you use, DESeq2, ANCOM?
• It is not mentioned how the thresholds for Spearman correlations in the co-occurrence networks were determined. Detailing these thresholds and the criteria for including or excluding nodes and edges in the networks is crucial for interpreting the results.
• Which package was used for the bar plots and other missing charts?
Lines 103-120:
• It would be useful to start with a brief introduction that contextualizes these results a bit.
Lines 126-139:
• It would be useful to start with a brief introduction that contextualizes the results, mentioning the objective of analyzing antibiotic resistance genes (ARGs) and their importance in the study.
• It is not clear to me which databases or how you conducted the search for antibiotic resistance genes. Does that database have any limitations? Does it provide more information about other genes? How did you determine what you found? I suggest you read this article to see the method I used; it might also help you for the discussion part of the results: https://www.sciencedirect.com/science/article/pii/S0048969724023593.
Lines 194-200:
• The methodology section of this part never mentions how the MGEs were identified and analyzed. Include details about the methods used to detect MGEs, such as specific sequencing, software for plasmid identification and other mobile elements, and databases used for annotation in the methodology.
Lines 220-225:
• Include a more comprehensive review of previous studies that have investigated the prevalence of ARGs in similar bodies of water. Compare the results of the present study with these studies to situate the findings in a broader context.
Lines 226-240:
• Critically discuss the results showing that phosphorus is the main physicochemical factor affecting ARG distribution in Lake CL. What is the reason for this?
• You could include a more detailed discussion about how these nutritional factors (nitrogen, phosphorus) interact with other elements of the lake ecosystem, other functionalities of the microorganisms, etc.
Lines 247-249:
• It is important to compare and contrast these results with other works to provide a more complete perspective.
• Did you encounter any limitations in the study? If so, it is important to mention them in the paper, such as potential biases in sample collection, temporal and spatial variability, and methodological limitations. This can be proposed as areas for future research to address these limitations.
Lines 264-270:
• I think you could provide a clearer and more complete summary of the main findings of the study, ensuring that all important points discussed in the article are covered.
• You need to mention the practical and theoretical implications of the results for water quality management and antibiotic use policies in the conclusions. Explain how these findings may influence future research and the implementation of mitigation strategies.
• It is also important to briefly acknowledge the study's limitations and how they might have influenced the results. This can include methodological aspects, sample size, and temporal or spatial variability.
• Suggest directions for future research that could expand knowledge on the distribution and factors influencing ARGs in water bodies. Propose long-term studies, more detailed metagenomic analyses, and the evaluation of specific interventions to reduce antibiotic contamination.
Introduction: Expand to provide a broader context and clearer relevance to the field.
Methodology: Include more specific details about sampling, data collection, and analysis techniques.
Discussion and Conclusion: Offer deeper analysis, comparison with previous work, and practical implications of the findings.
The current manuscript “Distribution status and influencing factors of antibiotic
resistance genes in the Chaohu Lake, China” is devoted to investigate antibiotic resistance genes (ARGs) in the Chaohu Lake and the results of this investigation are presented fundamental and practical interests.
1. English language is can be improved in some places: for example “9 kinds of ARGs” should be given with more professional way.
2. Literature well referenced & relevant. However, the background part is still should be more expanded, in point importance of investigating ARGs in chosen lake. Also, please include some data about microbial community of this freshwater ecosystem.
3. Structure conforms to PeerJ standards.
4. Regarding figures. They are not high quality, since the most of labels of figures are not readable.
Figure 3a: Taxonomy by what? There are listed drug families. What is multidrug? Are there several antibiotics?
Figure 4. Spearom > Spearman.
Figure 5a. The names of phyla are enough without kingdom. What about archaea? Were detected archaea?
Figure 5b. Characterizing composition of microbiota in the species level is not effective, since many of these species are uncultivated. Authors can show composition of microbiota in the genera level, for example.
5. Raw data are not supplied. The raw sequences of metagenomes of each sample should be deposited in NCBI SRA
Original primary research within Scope of the journal. The research question defined well, but there obtained results do not fill an identified knowledge gap. And methods are not described with sufficient detail.
1. It is not news that ARGs are everywhere, and resistome of Lake Chaohu is no exception. Consequently, I believe that the main question has not been addressed there: where do the ARGs come from to Lake Chaohu? To answer this question, it is also necessary to study the resistome of each river (are there 8 river?) flowing into this lake in order to understand which river makes what contribution to the formation of the lake resistome.
2. Water samples collection should be detailed.
3. How physicochemical characters of water impact to diversity and formation of resistome of Chaohu Lake? If authors studied functional role of microbial community of this lake, then it would be understandable. Please, remove this paragraph or give more strong reason, why physicochemical characters of water in formation of resistome are important.
4. DNA isolation method should be detailed. Were water samples filtered?
5. Authors performed shotgun metagenomic sequences. How many reads per sample were obtained?
6. Metagenome assembling should be detailed from reads processing to the binning.
7. How many metagenome assembled genomes (MAGs) were retrieved? This part should be separately described.
8. Which ARG database was used as a reference? You did not mention about the reference database.
9. And which tool or algorithm was used for searching ARGs?
10. Regarding the taxonomic profiling of microbiota. Authors carried out profiled using MetaPhlan3. It is known that reference databases for 16S rRNA amplicons are more complete than genomes databases. For this reason, please also search 16S rRNA genes in contigs and perform classification.
11. Lines 126-129. Except for unclassified ARGs, 9 kinds of ARGs were detected in CL, which contained 45 genes. How 9 ARGs include another 45 genes? All ARGs are classified. Actually, ARGs can be classified by AMR gene family, drug class and resistance mechanism. Hence, I recommend to authors reanalyze the resistome and classify properly.
12. Lines 162-193. Authors did not retrieve the MAGs from metagenomes and did not classified each retrieved high-quality genome. It means, that they have not separately bacteria or archaea, which classified on species level. However, they wrote that co-occurrence network analysis revealed that some ARGs formed the core network with several bacteria, for example: Acinetobacter lwoffii, Mycobacterium paragordonae, Mycolicibacterium mucogenicum, Rhizorhabdus wittichii, Mycobacterium europaeum and etc. There are only two ways to determine confidently microbes on the species level: cultivation/isolation/WGS or assembling MAGs from metagenome. So, I consider that this part of results should be revisited.
13. Line 194. Distribution pattern of ARGs in mobile genetic elements (MGEs). Which plasmids, transposons, IS were detected? This part should be expanded and enriched more scientific, valuable data.
14. Factors affecting the profiles of ARGs. I know that resistome is associated with microbial community. But I do not think that there are needed to physiochemical factors of water. It is useful if you study microbiome of wastewater treatment plants.
1. During the review this manuscript I did not notice the impact and novelty of this research. The design, obtained results of this experiment exactly are same in this work: Han M, Zhang L, Zhang N, Mao Y, Peng Z, Huang B, Zhang Y, Wang Z. Antibiotic resistome in a large urban-lake drinking water source in middle China: Dissemination mechanisms and risk assessment. J Hazard Mater. 2022 Feb 15;424(Pt D):127745. doi: 10.1016/j.jhazmat.2021.127745. Epub 2021 Nov 10. PMID: 34799156. Even I can say that in the work of Han et al. (2022) the results are more accurate, confidence, meaningful and scientific sound. Why authors decided to investigate resistome of Chaohu Lake for second time? This point should be cleared in the manuscript properly.
2. Conclusions are not well stated and are not linked to original research question. This section should not just be a summary; it should contain a take-home message. Multidrug, bacitracin, polymyxin, macrolide lincosamide streptogramin (MLS), and aminoglycoside were the five most abundant ARG types in CL. The abovementioned class of ARGs are not actually ARGs, they are antibiotics (drugs). This should be rewritten.
REVIEW: Zhang et al. “Distribution status and influencing factors of antibiotic resistance genes in the Chaohu Lake, China”
This is a very interesting paper in general, clearly explain and justified, and well-constructed. It is an important topic, and this paper could be a very nice contribution to the literature. In principle the paper is publishable. However, it is missing a lot of important details, and I will not recommend acceptance until these issues are addressed.
REVIEW: Zhang et al. “Distribution status and influencing factors of antibiotic resistance genes in the Chaohu Lake, China”
This is a very interesting paper in general, clearly explain and justified, and well-constructed. It is an important topic, and this paper could be a very nice contribution to the literature. In principle the paper is publishable. However, it is missing a lot of important details, and I will not recommend acceptance until these issues are addressed.
1) The Methods section is woefully inadequate. Instead of explaining their methods, the authors make the readers look up multiple different papers (Han et al. 2020, Wu et al. 2022) and read them. The methods in this paper must be complete enough to repeat the work without having to go to another paper. Not only is this the correct thing to do but it is practical as well. (1) What if that other paper is behind a paywall and I can’t access it? (2) What if that paper cites an earlier paper and I must read that as well? And what it that paper is not available? It also makes reviewing much more tedious because now I must read and evaluate multiple papers! This is not acceptable. Therefore, the authors need to detail the following (sometimes brief 1 sent descriptions will do unless the method is complicated).
a. How the collected and pre-treated the water samples.
b. The methods in the phytochemical analysis (Transparency, chlorophyll-a, and the other nutrients).
c. How DNA was extracted exactly and with what kits. Did the authors do any controls? Negative DNA controls are important to detect kit/material contamination.
d. Details on the metagenomic sequence are completely lacking. This could be complicated (purification, library construction, adapters, what sequencing machine, paired-end reads?) and needs to be described in some detail.
2) The bioinformatics is inadequate. It is very important to tell the reader what specific database were used (did the authors use the CARD (Comprehensive Antibiotic Resistance Database for antibiotics https://card.mcmaster.ca/ ? If not, why not? It is highly curated.)
a. Describe the details on Trimmomatic and MetaPhlan3. Did you produce count tables? Where are these tables and the mapping file? (see point 3).
b. The details on how MEGAHIT was used are missing. I use this all the time, and you receive a lot of contigs. Were the contigs binned (e.g,. Maxbin 2.0? If not, why not? Or did you just map reads to contigs and count contigs? This might really inflate the species number. And what database was used to do taxonomic identification?
c. For Antibiotics, it is important to use a good well-curated database. Was this done and how?
d. Can you describe more about Prodigal and the databases it uses?
e. Also: It would be great to have a script of all the commands used for the various programs. This would be easy to put together in a file and then added to a zenodo repository (see below).
f. For the statistical analysis, please add the versions of the R programs. But more importantly make available an R notebook with all the analyses and the data that was analyzed (the dataframe). See the zenodo repository below. This will allow for a high level of reproducibility.
3) It has never been easier to allow people to reproduce data and statistical analyses. All one needs to do is to put a Jupyter or a R notebook online along with the raw (count) data tables used to analyze the data. Anyone should be able to rerun the stats or graphs and get the same results. This is very easy to do and free. You can post the notebooks and input files at a repository like zenodo: https://zenodo.org/ Zenodo is free and produces a doi that you can put in your paper.
4) METADATA: I found the raw data on SRA, and it will be great to have, but I didn’t find a mapping file with the metadata. The SRA data has the names of the samples on the map (Figure 1) which is great, but it would be nice to have all the metadata (GPS, Chlorophyll-a levels, total phosphorus (TP), PO4-P, biochemical oxygen demand after five days 89 (BOD5), total nitrogen (TN), permanganate index (CODMn), NO3-N, NH4-N, and NO2-N..etc.)
Here is an example paper with links to all the scripts (in github) and the uploaded RAW data to ENA. Note the mapping file that has the ENA accession numbers. (It would be good to cite this paper as well in the introduction.)
https://www.mdpi.com/1660-4601/20/1/600
5) Major issues with compositional data: Researcher have recognized for the last few years that sequence data is inherently compositional in nature. This compositionality imposes serious issues for data analysis. For example, the use of relative abundance data is a problem because the samples numbers all add up to 1 or 100%. This means that if one organism (or gene) becomes a larger percentage of the sample, all the rest of the samples must decline even if their absolute abundances have not changed. This means that the data are not independent, which is an assumption of all statistical analyses. This is especially problematic in the network correlation analyses because there will be many false negative correlations. The solution to this dilemma is to use a mathematical transformation that converts the values into real number space that was invented in the 1980s for geology called the centered log-ratio.
Here is a paper on describing the problem:
https://www.frontiersin.org/articles/10.3389/fmicb.2017.02224/full
Here are some papers describing how to analyze such data:
https://academic.oup.com/gigascience/article/8/9/giz107/5572529
https://academic.oup.com/nargab/article/2/4/lqaa079/5917299
Given these issues, the network analysis (which is based entirely on correlations) needs to be redone with clr-transformed data. In terms of the paper, the authors should repeat the analyses for Spearman correlations and the networks using the clr-transformed data. The transformed data can be readily used with Spearman rank in R. This should also be included in the R notebook. It could make a big difference in the networks. Ideally, I would use the clr-transfomations for all the analyses including the Bray-Curtis (use Euclidean distances instead with clr-transformations), but I would accept using it only for the network analyses.
The number of data points is very good. 10 metagenomic samples is a good number in two locales. However, there is not enough in the methods to check whether the authors are doing the analysis correct.
Please see my comments. They need a lot more description of what they did and how they did it. And code needs to be released so everyone can check they get the same results.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.