Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
A backpropagation neural network model with adaptive feature extraction for music emotion recognition in online music appreciation

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on June 3rd, 2025 and was peer-reviewed by 3 reviewers and the Academic Editor.
The Academic Editor made their initial decision on July 10th, 2025.
The first revision was submitted on August 1st, 2025 and was reviewed by 2 reviewers and the Academic Editor.
The article was Accepted by the Academic Editor on August 14th, 2025.

Version 0.2 (accepted)

Osama Sohaib · Aug 14, 2025 · Academic Editor

Both reviewers have confirmed that the authors have addressed their comments.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

Reviewer 1 · Aug 7, 2025

Basic reporting

The suggested changes have been incorporated by the authors

Experimental design

The suggested changes have been incorporated by the authors

Validity of the findings

The suggested changes have been incorporated by the authors

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "A backpropagation neural network model with adaptive feature extraction for music emotion recognition in online music appreciation (v0.2)". PeerJ Computer Science

Reviewer 2 · Aug 8, 2025

Basic reporting

The paper after an update looks good

Experimental design

The comments regarding experiments are well implemented

Validity of the findings

The results and overall paper looks to be accepted

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "A backpropagation neural network model with adaptive feature extraction for music emotion recognition in online music appreciation (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter - submitted Aug 1, 2025

Version 0.1 (original submission)

Osama Sohaib · Jul 10, 2025 · Academic Editor

Major Revisions

Please see detailed reviews of all three reviewers. All three reviewers highlight major concerns regarding insufficient theoretical grounding, lack of methodological clarity, weak referencing, and poor academic rigor. The paper lacks explanation of key models, offers vague methodology, and omits crucial details such as feature selection, training configurations, or evaluation metrics. Figures are low-quality, and results lack statistical significance or interpretability. Reviewers also stress the absence of comparisons with modern deep learning models and call for stronger justification, clearer structure, and improved reproducibility. Overall, substantial revisions are required before consideration for publication.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 · Jul 1, 2025

Basic reporting

The manuscript presents a BP neural network-based framework for music emotion recognition in online music appreciation, integrating psychological models with feature engineering techniques. The paper is generally well-structured and addresses a timely and meaningful problem in the domain of affective computing and music education. However, there are several areas requiring significant clarification, enhancement of experimental rigor, and improvement in academic expression to meet the standards of a high-quality journal publication.
- Although the manuscript compares BP-NN with SVM, DT, and KNN, the absence of contemporary deep learning models such as CNN, LSTM, or Transformer-based architectures significantly weakens the credibility of the proposed method's superiority. It is strongly recommended to include comparisons with at least one neural network-based baseline such as CNN-LSTM or CLDNN.
- The Introduction section lacks a clear theoretical motivation for using BP-NN in the context of music education

Experimental design

- Details such as number of hidden layers, number of neurons per layer, learning rate, epoch count, batch size, and weight initialization strategies are missing. These should be explicitly presented and justified, preferably in a table format.
- There is no mention of regularization techniques (e.g., dropout, L2 regularization) or training-validation loss curves. How is overfitting controlled during training? Were any early stopping criteria applied?
- Terms such as "BP neural network", "Backpropagation network", and "BP-NN" are used interchangeably. Please standardize terminology throughout the manuscript.
- The segmentation technique using music energy is briefly mentioned but lacks algorithmic detail or mathematical formalization. It should be clarified whether energy-based segmentation uses fixed thresholds, dynamic thresholding, or temporal smoothing. A pseudocode or schematic figure would help.

Validity of the findings

- The classification assumes discrete emotions from Hevner's model, yet no inter-rater reliability analysis or subjective labeling validation is provided. Were human annotations used? If yes, how many annotators were involved, and what was the agreement rate (e.g., using Cohen's Kappa)?

Cite this review as

Reviewer 2 · Jul 3, 2025

Basic reporting

The manuscript introduces a conceptually interesting model for music emotion recognition with educational implications. However, the current version falls short in terms of technical depth, experimental robustness, and academic polish
- The manuscript employs a wide array of audio features (e.g., spectral centroid, zero-crossing rate, pitch strength), but there is no indication of dimensionality reduction or feature selection techniques. Feeding high-dimensional inputs directly into a shallow BP neural network without applying methods like PCA, LDA, or autoencoders may lead to overfitting or information redundancy.
- The authors are encouraged to explore and report on feature selection or regularization mechanisms, and to analyze feature importance using interpretable methods such as SHAP or LIME.
- The manuscript does not present any plots or discussion regarding the training loss curve, convergence behavior, or overfitting control mechanisms. It is unclear whether techniques such as early stopping, learning rate decay, or validation-based checkpointing were employed. Including training/validation loss curves and detailing training configurations (e.g., number of epochs, optimizer settings, loss function) would improve the transparency and reproducibility of the method.

Experimental design

- Results are reported based on accuracy and confusion matrices, but no confidence intervals or statistical significance tests (e.g., t-tests, ANOVA) are provided. Are the observed performance differences between BP-NN and baseline methods statistically significant?
- The manuscript contains several instances of redundant phrasing and unclear sentence structure. For example: "This paper suggests a neural network-based intelligent recognition and appreciation model for music to address this issue, which is the realization of algorithms generated by brain research." → needs rephrasing for clarity and grammatical accuracy.

Validity of the findings

- Figures (e.g., Fig. 8–10) lack resolution, axis labels, and significance indicators. It is unclear whether performance fluctuations are due to randomness or systemic model behavior. All figures should be vector-based and conform to journal plotting standards.
- While the hardware platform and software stack are described, exact code repositories or implementation frameworks (e.g., GitHub, HuggingFace) are not referenced. For reproducibility, code or at least pseudo-code of the core modules should be included.
- The conclusion primarily summarizes the findings without a critical synthesis. Please articulate clearer theoretical implications and potential for deployment.

Cite this review as

Reviewer 3 · Jul 10, 2025

Basic reporting

The content of the paper seems to be very promising with an appealing title, and an interesting start of the article. The large number of subjects is also promising with respect to the generalizing power of the findings. After continued reading, however, this positive feeling fades away with a feeling of very low academic standards. There is a lot technical stuff, which is merely explained; the methodology is not sufficiently explained, and there are many claims which are quite gratuitous due to a lack of strong referencing style. The paper needs much more theoretical and empirical grounding but above all more clarity and readability before it can accepted for publication.

General remarks

 The English language use is OK.
 Please be completely clear about the aim of the paper: is it about emotion or cognition? There is some confusion about this while reading the paper.
 Explain all abbreviations at first appearance in the text, e.g. BP algorithm.
 The descriptions of the methodology are not always sufficiently clear.
 The theoretical background is rather limited.
 Who proposed the music emotion recognition model? Is this a model introduced by the authors, or was it inspired by other scholars? Please be clearer about the ownership of the model
 Explain the Hevner and Thayer model in more substantial terms.
 The understandability of the text can be improved.
 The referencing style is rather weak. This holds in particular for Hevner and Thayer which are merely mentioned without any further critical elaboration.
 There are many claims and statements without sufficient grounding and lacking references. There is also a total lack of critical discussion for the theoretical background.
 The structure of the paper is not very coherent.
 There is an impression of a lot of computation that starts from poorly described raw data. The algorithms are also not clearly explained.
 The methodological part seems to be quite sophisticated but is not sufficiently explained to be understandable for common readers. Please explain all technical terms and symbols also in a short but intuitive way. There is actually a lack of understandability due to lack of intuitive descriptions of what is measured and what is found. In sum, there is need of more methodological rigor and clearness.

Detailed comments

 Line 30: it has become... What does the “it” refer to. Please be as clear as possible.
 Line 42: what is meant with “internal” emotion?
 Line 43: this paragraph seems to refer to cognition rather than to emotion. Please be clear about the aim of the paper: is it about cognition or about emotion?
 Line 50 ff: this paragraph is not clear. What is meant with the information put in the file, and what is the feature vector? The classification seems to refer to a cognitive model. Please explain more clearly.
 Line 5°: typo: “this information” instead of “these information”?
 Line 52: explain the abbreviation BP (backpropagation?) at first appearance.
Lines 55 ff: this is all very gratuitous; no references; no reference existing background knowledge.
 Line 65: explain more in detail which knowledge discovery is meant here? There is also the tension between knowledge and emotion which should at least be discussed to some extent.
 Line 66: awkward sentence construction
 Line 93: identifying the emotional type of music according to the lyric information of music is a very limited and restrictive approach
 Line 102: this is very broad generalization. Not all music is combined via computer technology. Many gratuitous claims without any references.
 Line 113: who is the owner of the analysis model of music recognition: the author or other scholars? This should be mentioned much more explicitly.
 Line 121: the model can be broken down into “two” layers, but the examples refer to “four” of them.
 Line 124: references to Hevner and Thayer are totally lacking. Their models are also not sufficiently explained.
 Line 125 ff: not clear, please explain more in detail
 Lines 134 ff: explain somewhat more in intuitive terms what I meant with fuzzy logic and vector-based representations and how these go together.
 Lines 136: some more critical discussion of the Hevner model is in place here. Why this model? What are the shortcomings? What about adaptations of the model by Farnsworth and more recently by Schubert? Same remark for Thayer. Also, the distinction between discrete, categorical and dimensional, continuous models should at least be addressed here. References are needed, which are now totally lacking.
 Line 144: here there is some answer to the previous comment. The quantitative approach of the emotion vector must be explained more in detail. The 8 Hevner categories have not yet been defined and explained. This is all very sloppy and not at the needed academic standards.
 Lines 154 ff: please provide more background information and references for the cybernetic approach.
 Line 165: provide some motivation for the claim that style is the most profound level of musical features. Is this true? Is there some evidence for this claim? References?
 Line 194: the description of the Hevner categories does not meet the minimal academic standards. The categories are not single words but clusters of words, and there are also no references to the original and seminal contribution by Hevner and later criticisms on her approach.
 Lie 203: references? Quite gratuitous claim.
 Line 206: same remark with respect to the Thayer model. Reducing the model to energy and pressure does not fully represent the theory. It is more about energetic and tense arousal. Also here, all references are lacking.
 Line 214: what is meant with “internal” psychological activity?
 Line 224 ff: who designed the language model? What is the motivation behind? How to understand and interpret the model. Please explain step by step. Explain the used symbols are refer to the place where this is done. This must be done also in intuitive terms so that common readers can understand.
 Line 236: is there a distinction between LA (line 228) and LAN? Please explain.
 Line 240: here there is a claim that there is a linear relation between a melody or segment and a distinct emotion. This is very gratuitous and needs at least some motivation. Melody is only one parameter of the music, and often there are more than one emotion at the same time. All this should be discussed more thoroughly. It is one thing to design computational models, but the insights and claims that function as the starting data must be validated and motivated. This is currently not the case.
 Line 268: here, two other aspects of music beyond melody are mentioned as well. See previous remark.
 Line 276: it is very important to clearly define in intuitive terms what exactly is meant with the term backpropagation. What does ReLU stand for?
 Lines 293-296: the distinction between accuracy and precision is not sufficiently clear.
 Lines 349: explain the used symbols and explain the steps more in detail. The whole procedure is difficult to follow.
 Line 377: incomplete sentence.
 Figure 1: Music works. Using the term works is reductionistic. What about improvised music or music that not has the status of a musical work? The distinction between musical emotional space and music feature space must also be explained more clearly.
 Figure 2: quality of image (resolution) is too low, barely readable. The classification seems also to be somewhat gratuitous. More motivation is needed? References?
 Figure 3: does not do justice to Hevner’s original model. No motivation for the reduction of the eight clusters to only 8 words. Also no references.
 Figure 4. The figure caption must provide the minimal needed information. There are better visualizations of this model than the one presented here.
 Figure 5: problem of reduction of the music space to the “melody” feature vector.
 Figure 6: the network must be described more in detail in the main text.
 Figure 7: provide a figure caption that gives the minimal needed information to interpret the figure.

Experimental design

The methodology is not sufficiently explained. There is a lot of technical stuff that is presented without any clear description. Also many abbreviations that are not explained. The computational part seems to be quite sophisticated, but the raw data from which it starts are not sufficiently motivated.

Validity of the findings

Given the very poor theoretical and empirical background and poor referencing style, the validity of the findings may be questioned. There are too much conclusions from too less data.

Additional comments

The paper as a whole does not meet the needed academic standards. I therefore suggest to reject the paper at this stage.

Cite this review as

Anonymous Reviewer (2025) Peer Review #3 of "A backpropagation neural network model with adaptive feature extraction for music emotion recognition in online music appreciation (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted Jun 3, 2025

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History A backpropagation neural network model with adaptive feature extraction for music emotion recognition in online music appreciation

Summary

Version 0.2 (accepted)

Osama Sohaib · Aug 14, 2025 · Academic Editor

Reviewer 1 · Aug 7, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · Aug 8, 2025

Basic reporting

Experimental design

Validity of the findings

Version 0.1 (original submission)

Osama Sohaib · Jul 10, 2025 · Academic Editor

Reviewer 1 · Jul 1, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · Jul 3, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 3 · Jul 10, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
A backpropagation neural network model with adaptive feature extraction for music emotion recognition in online music appreciation