Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on July 13th, 2021 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on August 31st, 2021.
  • The first revision was submitted on March 1st, 2022 and was reviewed by 1 reviewer and the Academic Editor.
  • A further revision was submitted on March 17th, 2022 and was reviewed by the Academic Editor.
  • A further revision was submitted on March 29th, 2022 and was reviewed by the Academic Editor.
  • The article was Accepted by the Academic Editor on April 7th, 2022.

Version 0.4 (accepted)

· Apr 7, 2022 · Academic Editor

Accept

Thank you for addressing the reviewers' concerns. Thank you also for making your data and code readily available for other researchers to use.

[# PeerJ Staff Note - this decision was reviewed and approved by Paula Soares, a PeerJ Section Editor covering this Section #]

[# PeerJ Staff Note: Although the Academic and Section Editors are happy to accept your article as being scientifically sound, a final check of the manuscript shows that it would benefit from further English editing. Therefore, please identify necessary edits and address these while in proof stage. PeerJ can provide language editing services - please contact us at copyediting@peerj.com for pricing (be sure to provide your manuscript number and title). Alternatively, you may approach your own choice of editing service provider. #]

Version 0.3

· Mar 20, 2022 · Academic Editor

Minor Revisions

Thank you for working to address the reviewer's comments. I read the revised parts of the manuscript. Although it helps, I found the revised sentence toward the end of the Introduction to be long and difficult to understand. There are some grammar problems with it as well. Please rework this part.

In the sentence added at the end of the Conclusions section, I am not sure what is meant by "to reach a better efficiency." This could probably be removed. If not, please make this more clear.

Version 0.2

· Mar 16, 2022 · Academic Editor

Minor Revisions

Thank you for working to address the reviewers' concerns. I received a follow-up response from the second reviewer who was overall fine with your changes but requested additional changes. Please address those and resubmit.

Reviewer 2 ·

Basic reporting

The authors have addressed many of my comments from the previous review and I thank them for the improvements in the methodology and presentation. I have three aspects that could be easily addressed as a minor revision and complete the work as based on my previous review.

I can agree with the scope of the work as noted in the response to reviewer. I would still suggest this being clearer and directly stating in the Intro/R&D the scope of this work so it is properly contextualized by your audience.

It needs to be mentioned in the text that the resources were open to the public aka open source/open access articles so it’s clear to and correctly interpreted by the reader.

Limitations of this work should be clearly mentioned in the R&D/conclusions.

Experimental design

See above

Validity of the findings

See above

Version 0.1 (original submission)

· Aug 31, 2021 · Academic Editor

Major Revisions

The reviewers' opinions were mixed. One reviewer noted fundamental limitations of the manuscript. Please review their comments and do your best to address them. I will need to send it out for a second round of review and go from there.

[# PeerJ Staff Note: Please ensure that all review and editorial comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.  It is a common mistake to address reviewer questions in the response letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the response letter.  Directions on how to prepare a response letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]

Reviewer 1 ·

Basic reporting

Thank your for a very well-written manuscript. The language is clear and the writing style is concise. A pleasure to read. A few comments

* Something is wrong with your PubMed IDs in Table 1 - None of the texts in the "text" column matches the PubMed IDs. E.g. your best scoring article PMID: "7232201" is a case report in French. Apart from this, I'm don't really find Table 1 particularly informative - perhaps include the entire list as a supplementary?

* You mention that you are looking for "metadata that enrich the characterization of target sequences", but I am not entirely sure what the criteria for "relevant" are. Could you please elaborate?

In terms of the annotations listed starting on line 70, many of these annotations are already integrated in PubTator Central. Have you tried this database for your search?

The paragraph starting on line 117 seems a bit redundant. Consider removing?

I guess the search term you list on line 139 ("antibiotcs resistance") contains a typo?

Experimental design

Methodologically, the method seems sound and is well described. You are not the first to embark on this endeavor, though.

* Did you test existing methods, E.g. BioReader, MScanner, MedlineRanker, and others? In order to justify publication of another tool for text classification, you should comment on how your method stands out from existing methods.

* There's a MeSH term for antimicrobial resistance (https://meshb.nlm.nih.gov/record/ui?ui=D004352). May I suggest checking how well this performs as a classifier? Could be a nice baseline for you to compare your method to.

* It is a bit unclear how you defined the two classes (relevant and irrelevant). It seems like you calculated the distance to your dictionary and set a more or less arbitrary threshold to define relevant or irrelevant, or did you manually curate the classes in the training set? If you assigned classes based on distance to the dictionary, isn't the whole classification step a bit redundant? Couldn't you just calculate the distance of any new article, and if it was above the threshold, you classify it as relevant? Seems like you are trying to predict a metric you just showed can be calculated.

Validity of the findings

Seems like your method performs quite well. As commented above, I need some more information about your relevant/irrelevant classification. Without this, it is hard to evaluate whether your method performs as anticipated.

Perhaps add a discussion of why your method fails for some cases.

Reviewer 2 ·

Basic reporting

There is significant lack of justification, limitations, and context for this work in the field. There is not a professional article structure, and the R&D is structure is very lacking, does not connect back to current literature, and is minimal.

Additionally, Indeed AMR is a huge problem across many domains of science and has become a One Health problem. Importantly, it is not just a biological problem but a multi-domain problem. The paper hones in only on the biological perspective for most of it – this is problematic in keeping the domains, language use, and interdisciplinary initiatives siloed. It is needed to add in more context and discussion relevant to this concept of One Health and how that can/will impact the model created and outcomes. Or, while this discussion and domain component is more important to recognize and discuss, the authors could further justify the biological focus for the scope of this project and provide limitations in the results and discussion for this approach.

Experimental design

There is no discussion on the size of the scientific articles extracted from pubmed. How was that collection determined? How were those key search terms determined? Does this mean that only open access articles were used for mining? This is not adequately explained or justified.

The first line of the R&D states “Using as search criteria the keywords “antibiotcs resistance”, we had access to the content of 4,517 scientific articles from PubMed”. There are two problems with this sentence – one I am hoping that it is just a mistype (that occurs several times) in the paper for antibiotics (not antibiotcs) and two when you use just the search term antibiotic resistance on pubmed, there are over 200K+ papers. 91K+ are full free text. How was 4,517 obtained? Was there more search terms added? Were the terms from Table 2 utilized as well or were they pulled as the CARD terms from the set of 4000? I viewed the searchtermarticle.py and the google colab is not accessible and pulls from a personal google sheet. Perhaps this will be fixed after publication but this does not clarify the above questions and further highlights that antibiotic resistance was the only search term used, and does not make any of this step reproducible. All of this needs to be justified and explained both for methodology used (and a section on limitations as based on this search and open access/not open access) needs to be added as well as making the code actually workable to reproduce the results in the paper.

The breakdown of relevant versus not relevant. There is some discussion of this in the methods, but there are a lot of limitations to discuss and also a lack of explanation of a use case of this kind of model on any AMR related phrase where this could be used. What about this utility for a biologist? Is it mostly just based on the amount and occurrence of CARD terms in the articles? Is that really a robust way to determine relevant or not relevant? How did the use of this dictonary, as stated in the conclusions, facilitate similarity in binning?

Validity of the findings

It is a huge jump, with no justification, to state this method could be used for other biological topics. There is no implication of that in this work, there is minimal justification of this working for AMR.

Additional comments

Figure 3 has no axis labels

All the tables in the table legend are called table 1

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.