Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Unsupervised learning analysis on the proteomes of Zika virus

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on May 18th, 2024 and was peer-reviewed by 3 reviewers and the Academic Editor.
The Academic Editor made their initial decision on June 12th, 2024.
The first revision was submitted on September 6th, 2024 and was reviewed by 3 reviewers and the Academic Editor.
The article was Accepted by the Academic Editor on October 1st, 2024.

Version 0.2 (accepted)

Shibiao Wan · Oct 1, 2024 · Academic Editor

Reviewers are satisfied with the revisions, and I concur to recommend accepting this manuscript.

[# PeerJ Staff Note - this decision was reviewed and approved by Carlos Fernandez-Lozano, a 'PeerJ Computer Science' Section Editor covering this Section #]

Reviewer 1 · Sep 12, 2024

Basic reporting

I am satisfied with the improvement the authors made in current version of paper draft.

Experimental design

No.

Validity of the findings

No.

Additional comments

No.

Cite this review as

Anonymous Reviewer (2024) Peer Review #1 of "Unsupervised learning analysis on the proteomes of Zika virus (v0.2)". PeerJ Computer Science

Reviewer 2 · Sep 30, 2024

Basic reporting

The authors have addressed all the requested revisions thoroughly.
1. They have provided examples of unsupervised learning in biological data, improved their method description, added pseudocode and workflow charts, cited R packages correctly, replaced abbreviations with full terms, and performed additional analyses, including a similarity matrix and evaluation of clustering robustness.
2. Additionally, they have toned down the claims of novelty as suggested.

Based on these comprehensive updates，I recommend accepting the manuscript.

Experimental design

Validity of the findings

Additional comments

Cite this review as

Anonymous Reviewer (2024) Peer Review #2 of "Unsupervised learning analysis on the proteomes of Zika virus (v0.2)". PeerJ Computer Science

Reviewer 3 · Sep 14, 2024

Basic reporting

All concerns have been addressed, and the paper is now well-structured and clearly articulated following the revision.

Experimental design

All concerns have been addressed, and the paper provides a clear and detailed description of the four algorithms used in the manuscript. Additionally, the new functions to monitor time consumption in each analytical section are beneficial.

Validity of the findings

All concerns have been addressed.

Additional comments

N/A

Cite this review as

Anonymous Reviewer (2024) Peer Review #3 of "Unsupervised learning analysis on the proteomes of Zika virus (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter (v0.2) - submitted Sep 6, 2024

Version 0.1 (original submission)

Shibiao Wan · Jun 12, 2024 · Academic Editor

Major Revisions

The reviewers have substantial concerns about this manuscript. The authors should provide point-to-point responses to address all the concerns and provide a revised manuscript with the revised parts being marked in different color.

Reviewer 1 · May 29, 2024

Basic reporting

N0.

Experimental design

This research paper uncover hidden patterns from polymorphic amino acid sites extracted on the proteome ZIKV multialignments, without the need of an underlying evolutionary model via four dimentional reduction models, and revealed speciûc host and geographical clustering patterns for ZIKA virus.
However, it would be better to include a workflow chart (as the first figure) for this research to enhance clarity and guide readers through the process.

Validity of the findings

1.In original dataset, is there any missingness? How do you handle the missingness if exists?
2.How do you ensure the uniquesness of the results in dementional reduction when using UMAP and t-SNE?
3. What is the best number of groups in k-means clustering process when evaluated withthe silhouette coefficient? Is it exactly 6 or not?

Additional comments

No.

Cite this review as

Anonymous Reviewer (2024) Peer Review #1 of "Unsupervised learning analysis on the proteomes of Zika virus (v0.1)". PeerJ Computer Science

Reviewer 2 · Jun 9, 2024

Basic reporting

The paper titled "Unsupervised learning analysis on the proteomes of Zika virus" investigates the use of unsupervised learning (UL) algorithms to analyze the proteomes of the Zika virus. The author try to make a point that this study explores the potential of UL algorithms, which do not require labeled training data, to reveal additional insights, which differ itself with the traditional methods. Specially, the authors applied Unsupervised Random Forest (URF) along with dimensional reduction algorithms such as Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), t-distributed Stochastic Neighbor Embedding (t-SNE), and AutoEncoders (AE) to analyze polymorphic amino acid sites from ZIKV proteome alignments. The study found that the four UL algorithms could identify specific host and geographical clustering patterns for ZIKV. They were also capable of detecting imported viruses within these geographical clusters. Among the dimensional reduction techniques, UMAP performed the best. The UL axis coordinates showed significant correlation with phylogenetic tree branch lengths and demonstrated significant phylogenetic dependence in the Abouheif Cmean and Pagel’s Lambda tests (p < 0.01), indicating that these UL methods offer comparable performance to traditional phylogenetic methods.
Comments
I think the paper is generally well-structured, but I believe polishing the writing and providing better explanations will benefit the paper more. I enjoyed reading it and suggest a minor revision.
1. Motivation:
o My understanding is that motivation is still part of the introduction. So, I suggest mentioning more previous research using unsupervised learning for more specific biological data, such as genomes and proteomes.
2. Methods:
o Model Section: Your description of leveraging URF models for data analysis could benefit from a pseudo-code section. The current format is confusing and hard to follow.
o What’s the rationale for choosing URF as the primary method, given there are many other methods available nowadays? How did you decide on the optimal hyperparameters for URF? Are your clustering results still robust when using a different set of hyperparameters?
3. For all the major R packages that you used, please cite them properly.
4. Avoid using abbreviations like AE; instead, use the full term "Autoencoder" for clarity.
5. I understand that Figure 1 is for visualization and Figure 2 supports Figure 1 as metrics for measuring clustering separation. However, an additional analysis measuring the similarity and diversity between different regions would be beneficial. You cannot get that kind of information just from UMAP, etc. I suggest calculating the Euclidean distance between the centroids of individual groups and providing a similarity matrix.
6. The authors also need to provide some evaluation or summary on how well their URF model fits the data.

Experimental design

no comment

Validity of the findings

The authors also need to provide some evaluation or summary of how well their URF model fits the data.

If you want to claim that UL algorithms could be practical evolutionary analytical techniques to track the dispersal of viral pathogens, experiments on an external dataset are encouraged.

Additional comments

The other thing I want to mention is that the authors might want to tone down the claims in the manuscript a little bit. I don't fully agree with the level of novelty that the authors claim.

Cite this review as

Anonymous Reviewer (2024) Peer Review #2 of "Unsupervised learning analysis on the proteomes of Zika virus (v0.1)". PeerJ Computer Science

Reviewer 3 · Jun 12, 2024

Basic reporting

Here are some specific comments for the manuscript:
1. A list of all abbreviations is recommended. More details in the caption are needed, e.g., the explanation of x and y axes in Figure 1.
2. What is the rationale for using the proteome for clustering?
3. Please also discuss the traditional approaches used for clustering as well as the pros and cons to show the necessary requirements for the development of a new clustering tool.
4. Please include the proteome data used in the manuscript.

Experimental design

Overall, the authors clearly described the four algorithms used in the manuscript with sufficient details, and here are some specific comments:
1. Please provide an overall workflow for the clustering.
2. How would the new tool be compared to a traditional clustering tool? Are there any additional significant insights that could be obtained using the presented method? What are the pros and cons? How long does it take for the clustering process?

Validity of the findings

The results and conclusions are well stated and supported by the presented data. However, how would the authors comment on the future directions and how about the application of the presented algorithms in other fields?

Additional comments

The authors conducted four unsupervised learning algorithms on the ZIKV proteome obtained from the virus variation database. They successfully identified specific host and geographical clustering patterns, showing results comparable to the phylogenetic tree. The work presented an alternative practical evolutionary analytical technique to trace the dispersal of viral pathogens. Here are some specific comments:

Cite this review as

Anonymous Reviewer (2024) Peer Review #3 of "Unsupervised learning analysis on the proteomes of Zika virus (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted May 18, 2024

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Unsupervised learning analysis on the proteomes of Zika virus

Summary

Version 0.2 (accepted)

Shibiao Wan · Oct 1, 2024 · Academic Editor

Reviewer 1 · Sep 12, 2024

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Sep 30, 2024

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 3 · Sep 14, 2024

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.1 (original submission)

Shibiao Wan · Jun 12, 2024 · Academic Editor

Reviewer 1 · May 29, 2024

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Jun 9, 2024

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 3 · Jun 12, 2024

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
Unsupervised learning analysis on the proteomes of Zika virus