Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Machine learning-assisted genomic profiling to identify differences between Bacillus Calmette-Guérin (BCG) vaccine strains and non-BCG wild-type Mycobacterium bovis

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on April 24th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
The Academic Editor made their initial decision on June 11th, 2025.
The first revision was submitted on July 21st, 2025 and was reviewed by 2 reviewers and the Academic Editor.
A further revision was submitted on August 11th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
The article was Accepted by the Academic Editor on August 21st, 2025.

Version 0.3 (accepted)

Giovanni Angiulli · Aug 21, 2025 · Academic Editor

Accept

Dear Authors,

Your paper has been revised. It has been accepted for publication in PEERJ Computer Science. Thank you for your fine contribution.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

Reviewer 2 · Aug 14, 2025

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

I have no further comments, and it is my great pleasure to recommend its publication in the present form.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Machine learning-assisted genomic profiling to identify differences between Bacillus Calmette-Guérin (BCG) vaccine strains and non-BCG wild-type Mycobacterium bovis (v0.3)". PeerJ Computer Science

Download Version 0.3 (PDF) Download author's response letter (v0.3) - submitted Aug 11, 2025

Version 0.2

Giovanni Angiulli · Aug 7, 2025 · Academic Editor

Minor Revisions

Dear Authors,
Your paper has been revised. It needs minor revisions before being accepted for publication in PEERJ Computer Science. More precisely:

1) In the Conclusions section, you have retained outdated terminology that was corrected elsewhere. Specifically, the phrases "codon-level variations" and "long-range genomic interactions" are inconsistent with the model's actual input (bitscore values). This nomenclature contradicts the updated terms in the Discussion section ("gene-level conservation variations" and "dependencies across the gene set"). You must change the above phrases in the Conclusions section to match the Discussion to ensure accuracy throughout the manuscript.

Reviewer 1 · Aug 7, 2025

Basic reporting

The paper propsoed a machine-learning genomic tool that distinguishes BCG vaccine strains from pathogenic non-BCG wild-type Mycobacterium bovis.

Experimental design

Using whole-genome sequences of 72 clinical isolates, the authors trained and compared a Random-Forest classifier and a 1D CNN to identify a concise 47-gene signature that can resolve diagnostic ambiguities in newborns who have received the BCG vaccine.

Validity of the findings

The authors perform 5-fold stratified cross validation for the CNN and out-of-bag for the Random-Forest—withheld data, achieving 95–96 % accuracy and AUC of 0.96–0.99. However, no wet-lab or clinical validation (e.g., qPCR on new patient samples) is presented.

Additional comments

The figures need be shown in higher resolution. The AUC curves need be introduced clearly for the meaning of these different regions.

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "Machine learning-assisted genomic profiling to identify differences between Bacillus Calmette-Guérin (BCG) vaccine strains and non-BCG wild-type Mycobacterium bovis (v0.2)". PeerJ Computer Science

Reviewer 2 · Aug 2, 2025

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

Most issues in the manuscript have been effectively revised based on the review comments. However, a key terminological inconsistency remains and must be resolved.

In the Conclusions section, the authors have retained outdated terminology that was corrected elsewhere. Specifically, the phrases "codon-level variations" and "long-range genomic interactions" are inconsistent with the model's actual input (bitscore values). This contradicts the updated terms in the Discussion section ("gene-level conservation variations" and "dependencies across the gene set"). These phrases in the Conclusions must be changed to match the Discussion to ensure accuracy throughout the manuscript.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Machine learning-assisted genomic profiling to identify differences between Bacillus Calmette-Guérin (BCG) vaccine strains and non-BCG wild-type Mycobacterium bovis (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter (v0.2) - submitted Jul 21, 2025

Version 0.1 (original submission)

Giovanni Angiulli · Jun 11, 2025 · Academic Editor

Major Revisions

Dear Authors,
Your paper has been revised. It needs major revisions before being accepted for publication in PEERJ Computer Science. More precisely:

1) The dataset (n=72) may limit the model's generalizability, especially given the reliance on historical lab strains that may not fully represent natural M. bovis diversity. The non-coding genomic regions, which could be critical for strain differentiation, were not explored or discussed in the current version. The CNN's lack of attention mechanisms restricts its interpretability of long-range genomic interactions. The authors must face the aforementioned issues.

2) The Methods section states that gproNOG.hmm (profile HMMs for Gammaproteobacteria proteins from eggNOG) was used to align protein sequences of Mycobacterium bovis (which belongs to Actinobacteria). Gammaproteobacteria and Actinobacteria are evolutionarily distant. Why was gproNOG hmm chosen? The authors need to explain this.

3) The Discussion section mentions that CNN's "1D convolutional layers effectively captured codon-level variations" and "absence of attention mechanisms limits the interpretability of long-range genomic interactions." However, the actual input to the CNN is the bitscore values of 47 genes (numerical features), not the raw DNA sequences. Therefore, the discussion regarding "codon-level variations" and "long-range genomic interactions" may not align with the actual input of the model and must be updated.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 · May 29, 2025

Basic reporting

This study leverages machine learning to differentiate BCG vaccine strains from non-BCG wild-type Mycobacterium bovis using genomic profiling.

Experimental design

It analyzes 72 clinical isolates with whole-genome sequencing, employing a random forest classifier and a 1D CNN to identify key biomarkers. The random forest achieved 96% accuracy with 47 attenuation-related genes, while the CNN maintained over 90% accuracy across training sets, highlighting metabolic reprogramming and secondary metabolite biosynthesis pathways as significant biomarkers.

Validity of the findings

The findings need further validation in independent datasets. The generality of these biomarkers is very important.

Additional comments

The dataset (n=72) may limit the model's generalizability, especially given the reliance on historical lab strains that may not fully represent natural M. bovis diversity. The non-coding genomic regions, which could be critical for strain differentiation, were not explored or discussed in the current version. The CNN's lack of attention mechanisms restricts its interpretability of long-range genomic interactions. Minor comment is about the figures. They need to be shown clearly.

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "Machine learning-assisted genomic profiling to identify differences between Bacillus Calmette-Guérin (BCG) vaccine strains and non-BCG wild-type Mycobacterium bovis (v0.1)". PeerJ Computer Science

Reviewer 2 · Jun 1, 2025

Basic reporting

no comment

Experimental design

The study aims to use machine learning methods to distinguish between BCG vaccine strains and non-BCG wild-type Mycobacterium bovis, which is an important topic with clinical diagnostic significance. The authors employed Random Forest and Convolutional Neural Network (CNN) models and conducted functional analysis on key genes differentiating the strains. Overall, this work has a certain degree of application potential. However, the manuscript has some critical issues in the methodology and presentation of results that need to be carefully revised and clarified by the authors. Specific comments are as follows:
1. The Methods section states that gproNOG.hmm (profile HMMs for Gammaproteobacteria proteins from eggNOG) was used to align protein sequences of Mycobacterium bovis (which belongs to Actinobacteria). Gammaproteobacteria and Actinobacteria are evolutionarily distant. Why was gproNOG hmm chosen? The authors need to provide an explanation.
2. I suggest adding a subsection in the Results to detail the process and rationale for training parameter optimization.
3 . The Discussion section mentions that the CNN's "1D convolutional layers effectively captured codon-level variations" and "absence of attention mechanisms limits interpretability of long-range genomic interactions". However, the actual input to the CNN is the bitscore values of 47 genes (numerical features), not the raw DNA sequences. Therefore, the discussion regarding "codon-level variations" and "long-range genomic interactions" may not align with the actual input of the model.

Validity of the findings

1. The study used different training set proportions (50%-80%) to evaluate the CNN model. I am cautious about this approach. Given the small total sample size (n=72), why not use a more robust evaluation method, such as k-fold cross-validation, to assess the model's generalization ability?
2. Is iterating feature selection down to only 47 genes too aggressive? How can it be ensured that overfitting does not occur?

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Machine learning-assisted genomic profiling to identify differences between Bacillus Calmette-Guérin (BCG) vaccine strains and non-BCG wild-type Mycobacterium bovis (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted Apr 24, 2025

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Machine learning-assisted genomic profiling to identify differences between Bacillus Calmette-Guérin (BCG) vaccine strains and non-BCG wild-type Mycobacterium bovis

Summary

Version 0.3 (accepted)

Giovanni Angiulli · Aug 21, 2025 · Academic Editor

Reviewer 2 · Aug 14, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.2

Giovanni Angiulli · Aug 7, 2025 · Academic Editor

Reviewer 1 · Aug 7, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Aug 2, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.1 (original submission)

Giovanni Angiulli · Jun 11, 2025 · Academic Editor

Reviewer 1 · May 29, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Jun 1, 2025

Basic reporting

Experimental design

Validity of the findings

Review History
Machine learning-assisted genomic profiling to identify differences between Bacillus Calmette-Guérin (BCG) vaccine strains and non-BCG wild-type Mycobacterium bovis