Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on June 2nd, 2023 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on June 18th, 2023.
  • The first revision was submitted on August 1st, 2023 and was reviewed by 3 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on August 27th, 2023.

Version 0.2 (accepted)

· Aug 27, 2023 · Academic Editor

Accept

All the reviewers have endorsed the publication of the study.

[# PeerJ Staff Note - this decision was reviewed and approved by Brenda Oppert, a PeerJ Section Editor covering this Section #]

Reviewer 1 ·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

·

Basic reporting

Basic reporting is appropriate

Experimental design

Methods are well written and is appropriate.

Validity of the findings

A thorough analysis of the lung cancer data with performance comparison helps validate how the transformer encoder plus dilated convolution better extracts features with respect to DNA sequences.

Reviewer 3 ·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

It is expected that the authors will explore the feature representation of other NLP models in the bioinformatics domain in the future.

Version 0.1 (original submission)

· Jun 18, 2023 · Academic Editor

Major Revisions

Dear authors, please refer to the reviewers' comments and revise your manuscript accordingly. When resubmitting your manuscript. please provide a point-by-point response letter so the reviewers and editor can see your responses to their comments.

Reviewer 1 ·

Basic reporting

This study proposes a deep learning prediction method for DNA methylation. DNA sequence encoding and feature extraction are performed by combining GloVe and transformer encode. In the 5mC dataset, a prediction accuracy of 97.9% was achieved, and the performance was better than that of previous studies, and the feasibility of the modified method was verified in the m1A dataset. This research is relatively complete, but the manuscript needs to be further polished before it could be considered for publication by the journal.

Line 79-82, "convolutional neural networks (CNNs) can extract features for DNA, but they are not sensitive to 1D sequential data. On the other hand, recurrent neural networks (RNNs) are more suitable for feature extraction of sequential signals, but they do not perform well in learning81 remote relationship", the disadvantages of CNN and RNN are described, but in Line 131-133, "CNN and recurrent neural networks RNN have been proven to perform well in predicting DNA methylation sites" are also described. This is contradictory, author should rationally explain the advantages and disadvantages of these methods.

Experimental design

no comment

Validity of the findings

Line 251-252, "In order to make a fair comparison, all models are trained with the aforementioned dataset", I checked the paper, and found that the data provided in Table 1 is different from the total amount of data mentioned in these papers. Whether the author used data quality control or pre-processing methods to screen the data (10.3390/molecules26247414) (10.3389/fcell.2020.00614)?
Line 251-252, if the amount of data used is different from the three references cited, "iPromoter-5mC (Cheng et al., 2021), 5mC-Pred (Tran et al., 2021) and BiLSTM-5mC (Zhang et al., 2020b)", then whether the conclusion of Figure 8-12 is reasonable. In addition, I'm curious whether the author trained the three methods mentioned in Table 2, because I saw the same table and results in the references (10.3390/molecules26247414), if the result is a reference, it should be clearly stated in the paper to avoid unnecessary misunderstanding.
In the two subsections "Influence of encoding methods" and "Influence of feature extraction methods", by comparing with one-hot encoding and LSTM, what is the reason why the author chooses to compare with these two methods and whether there are other methods worth comparing? I think it is not enough to compare only one method.
The method proposed in this study has achieved excellent results, and its performance exceeds that of similar studies. The MCC index score is far higher than that of the other three methods. The author should discuss the reasons for the high performance of this research model in detail.

Annotated reviews are not available for download in order to protect the identity of reviewers who chose to remain anonymous.

·

Basic reporting

Line 262 - check for proper usage of the word 'trend' otherwise basic reporting looks appropriate.

Experimental design

Experimental design is appropriate. Great work of comparing the different model on the same dataset as the proposed model.

Validity of the findings

Finding and conclusions look appropriate.

Additional comments

Small Cell Lung Cancer is a great start and would be curious to see its application to other tumor types as well.

Reviewer 3 ·

Basic reporting

Figures 8 to 12 can be merged together and labelled appropriately.

Experimental design

1. It is recommended to add some ablation experiments, for example, when K in k-mer is set to 2 or 4, the glove Characteristic length is 50 or 150
2. It is recommended to discuss whether the imbalance of the dataset has an impact on the model results;
3. Figure 13 discusses the comparison between one hot encoding and the encoding method proposed in the article. Figure 8 shows the results obtained using one hot and deep neural networks. Do these two results indicate that the transformer method proposed in this article is not as good as deep neural networks? Please explain this.

Validity of the findings

The conclusion section is insufficient, and it is recommended to conduct in-depth discussions on the results of the article

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.