All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The reviewers consider that you have addressed all their comments and I therefore recommend the paper for publication.
[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]
no comment
no comment
no comment
My concerns have been well-addressed.
I have received and carefully reviewed the revised manuscript along with the authors' response to the reviewers' comments.
The authors have done a thorough job in addressing the concerns raised in the previous review. The revisions and clarifications provided are satisfactory and have significantly improved the manuscript. Consequently, I have no further comments or requested changes.
no comment
no comment
I thank the authors for their comprehensive point-by-point responses and for incorporating the suggested changes into the revised manuscript. I have reviewed the updates and find that they have satisfactorily addressed all my previous concerns. I have no additional comments.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** When you prepare your next revision, please either (i) have a colleague who is proficient in English and familiar with the subject matter review your manuscript, or (ii) contact a professional editing service to review your manuscript. PeerJ can provide language editing services - you can contact us at [email protected] for pricing (be sure to provide your manuscript number and title). – PeerJ Staff
-
-
-
This work investigates the efficacy of BiLSTM, BiGRU, and BERT for emotion classification over a large, imbalanced dataset of 422,746 text samples across six emotions. The study compares static, trainable, and contextual embeddings (GloVe, FastText, and BERT). BERT outperforms others with 94.07% accuracy, benefiting from dynamic contextual understanding. The study proposes hybrid embeddings and multimodal strategies as future directions. The following are my concerns:
1. While BERT showed superior performance, it also incurs higher computational costs. Could the authors detail the exact hardware configuration and training times for all models?
2. While the class-weighted loss is acknowledged, there’s insufficient exploration of other imbalance mitigation techniques like oversampling, undersampling, or synthetic data generation.
3. To enrich the manuscript, more related works could be reviewed, such as EEG-based emotion recognition using a hybrid CNN and LSTM classification.
4. The paper lacks a cost-performance trade-off discussion, especially around BERT’s scalability.
5. Overfitting risk is briefly mentioned, but dropout is the only regularization discussed. Other techniques like label smoothing, L2 weight decay, or data augmentation are not evaluated.
6. FastText embeddings (300D) and GloVe (100D) were used, but their dimensional discrepancy might bias performance.
1. The introduction provides a clear context for emotion classification within natural language processing (NLP), citing its importance in applications such as sentiment analysis, mental health, and human-computer interaction.
2. The challenges of emotion classification—such as dataset imbalance, semantic overlap, and model limitations—are thoroughly explained.
3. The evolution of methods from lexicon-based to deep neural networks, and then to transformer-based models like BERT, is logically and chronologically presented, with relevant citations (e.g., Fei et al., 2020; Gu et al., 2022).
4. The objectives are clearly stated (lines 92–98), establishing the study’s scope and relevance.
The literature review is comprehensive, covering traditional (lexicon, SVM) and modern (BiLSTM, BERT) approaches.
5. References are timely and numerous, showing awareness of state-of-the-art developments (e.g., Alhuzali & Ananiadou, 2021; Mossad et al., 2023).
6. The review also covers multimodal emotion detection (text, speech, visual), though that is beyond the scope of the experiments in this paper.
Suggestions:
1. The motivation could be enhanced slightly by emphasizing why sentence-level emotion classification specifically matters over word- or document-level.
2. Consider organizing the literature review more structurally — grouping techniques (traditional ML, RNNs, transformers) for clearer comparison.
3. Proofs are not expected here since it's an empirical study — but definitions could be improved with a clearer separation between notation and explanatory text, e.g., subscripts are hard to parse in some formulae.
The research involves:
A large-scale dataset (422,746 samples).
Multiple neural models tested (BiLSTM, BiGRU, BERT).
Systematic experimentation including hyperparameter tuning, loss analysis, and error patterns.
Ethical considerations are implicitly satisfied (no sensitive/private data; data from public repositories).
There is no human participant involvement, so IRB approval is not required.
The dataset is publicly available: https://doi.org/10.5281/zenodo.15009245 (line 140).
Preprocessing pipeline is well described (text cleaning, tokenization, padding/truncation, label encoding — lines 101–151 and Table 2).
Model architectures and configurations are documented (lines 153–164 and Table 3).
Hyperparameters (learning rates, dropout, optimizers, etc.) are shared (lines 285–293 and Table 6).
Software stack is identified (Python, PyTorch, pandas, NumPy, Matplotlib — lines 224–227).
The findings presented in this manuscript are valid and well-supported by the experimental design, analysis, and methodology. The study employs a large-scale dataset (over 422,000 labeled samples) and compares multiple neural architectures (BiLSTM, BiGRU, BERT) under consistent preprocessing, training, and evaluation conditions. The use of robust evaluation metrics (accuracy, precision, recall, F1-score) alongside confusion matrix analysis provides a comprehensive and objective assessment of model performance. The implementation of class-weighted loss functions and detailed hyperparameter tuning further supports the integrity of the results. Qualitative error analysis enriches the interpretability of outcomes, particularly in identifying frequent misclassifications among semantically overlapping emotion categories. The conclusions are consistent with prior literature and appropriately nuanced by acknowledged limitations, including dataset imbalance and domain specificity. Overall, the study demonstrates strong internal validity and offers findings that are replicable and relevant to the field of emotion classification in NLP.
This is a well-structured and methodologically sound study that offers a comprehensive comparison of advanced neural architectures—BiLSTM, BiGRU, and BERT—for sentence-level emotion classification. The manuscript is clearly written and well-aligned with the aims and scope of the journal. It benefits from strong experimental rigor, with detailed descriptions of preprocessing, model design, hyperparameter tuning, and evaluation metrics.
The public availability of both the dataset and the source code further enhances the reproducibility and transparency of the work, making it a valuable reference for the research community.
A few minor suggestions for improvement:
Mathematical Notation: Some of the formulas (especially those related to LSTM/GRU operations and the self-attention mechanism) would benefit from clearer formatting or typesetting for better readability.
Literature Organization: The related work section could be more clearly structured by thematically grouping traditional machine learning methods, recurrent neural networks, and transformer-based models, which would improve clarity and flow.
Generalizability: The authors appropriately acknowledge the limitation of using a single dataset. Future extensions of this work could explore domain adaptation or evaluation on multiple corpora to reinforce the generalizability of findings.
Overall, this manuscript presents a solid contribution to the field of affective computing and natural language processing. I recommend acceptance with minor revisions.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.