All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
All the reviewers have endorsed the publication of the study.
[# PeerJ Staff Note - this decision was reviewed and approved by Brenda Oppert, a PeerJ Section Editor covering this Section #]
no comment
no comment
no comment
Basic reporting is appropriate
Methods are well written and is appropriate.
A thorough analysis of the lung cancer data with performance comparison helps validate how the transformer encoder plus dilated convolution better extracts features with respect to DNA sequences.
no comment
no comment
no comment
It is expected that the authors will explore the feature representation of other NLP models in the bioinformatics domain in the future.
Dear authors, please refer to the reviewers' comments and revise your manuscript accordingly. When resubmitting your manuscript. please provide a point-by-point response letter so the reviewers and editor can see your responses to their comments.
This study proposes a deep learning prediction method for DNA methylation. DNA sequence encoding and feature extraction are performed by combining GloVe and transformer encode. In the 5mC dataset, a prediction accuracy of 97.9% was achieved, and the performance was better than that of previous studies, and the feasibility of the modified method was verified in the m1A dataset. This research is relatively complete, but the manuscript needs to be further polished before it could be considered for publication by the journal.
Line 79-82, "convolutional neural networks (CNNs) can extract features for DNA, but they are not sensitive to 1D sequential data. On the other hand, recurrent neural networks (RNNs) are more suitable for feature extraction of sequential signals, but they do not perform well in learning81 remote relationship", the disadvantages of CNN and RNN are described, but in Line 131-133, "CNN and recurrent neural networks RNN have been proven to perform well in predicting DNA methylation sites" are also described. This is contradictory, author should rationally explain the advantages and disadvantages of these methods.
no comment
Line 251-252, "In order to make a fair comparison, all models are trained with the aforementioned dataset", I checked the paper, and found that the data provided in Table 1 is different from the total amount of data mentioned in these papers. Whether the author used data quality control or pre-processing methods to screen the data (10.3390/molecules26247414) (10.3389/fcell.2020.00614)?
Line 251-252, if the amount of data used is different from the three references cited, "iPromoter-5mC (Cheng et al., 2021), 5mC-Pred (Tran et al., 2021) and BiLSTM-5mC (Zhang et al., 2020b)", then whether the conclusion of Figure 8-12 is reasonable. In addition, I'm curious whether the author trained the three methods mentioned in Table 2, because I saw the same table and results in the references (10.3390/molecules26247414), if the result is a reference, it should be clearly stated in the paper to avoid unnecessary misunderstanding.
In the two subsections "Influence of encoding methods" and "Influence of feature extraction methods", by comparing with one-hot encoding and LSTM, what is the reason why the author chooses to compare with these two methods and whether there are other methods worth comparing? I think it is not enough to compare only one method.
The method proposed in this study has achieved excellent results, and its performance exceeds that of similar studies. The MCC index score is far higher than that of the other three methods. The author should discuss the reasons for the high performance of this research model in detail.
Line 262 - check for proper usage of the word 'trend' otherwise basic reporting looks appropriate.
Experimental design is appropriate. Great work of comparing the different model on the same dataset as the proposed model.
Finding and conclusions look appropriate.
Small Cell Lung Cancer is a great start and would be curious to see its application to other tumor types as well.
Figures 8 to 12 can be merged together and labelled appropriately.
1. It is recommended to add some ablation experiments, for example, when K in k-mer is set to 2 or 4, the glove Characteristic length is 50 or 150
2. It is recommended to discuss whether the imbalance of the dataset has an impact on the model results;
3. Figure 13 discusses the comparison between one hot encoding and the encoding method proposed in the article. Figure 8 shows the results obtained using one hot and deep neural networks. Do these two results indicate that the transformer method proposed in this article is not as good as deep neural networks? Please explain this.
The conclusion section is insufficient, and it is recommended to conduct in-depth discussions on the results of the article
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.