Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on May 19th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on August 4th, 2025.
  • The first revision was submitted on August 22nd, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • A further revision was submitted on October 14th, 2025 and was reviewed by the Academic Editor.
  • The article was Accepted by the Academic Editor on October 20th, 2025.

Version 0.3 (accepted)

· · Academic Editor

Accept

Dear Authors, thank you for addressing all my concerns.

[# PeerJ Staff Note - this decision was reviewed and approved by Claudio Ardagna, a PeerJ Section Editor covering this Section #]

Version 0.2

· · Academic Editor

Minor Revisions

Dear Authors,

I have decided on a minor revision focused on clarification and light analysis.

Please, clearly indicate that predictors such as core temperature, thrombin time, and D-dimer were available at admission, and discuss their overlap with diagnostic criteria to avoid label leakage.

Also, please document better the preprocessing steps.

Confirm that KNN imputation and SVM-RFE were performed only on the training years and then applied to 2023 without refitting; including this in the Methods section would clarify this.

Report the calibration intercept and slope with confidence intervals, and perform a simple post-hoc calibration on the training data before evaluating the 2023 set. Update the decision-curve analysis accordingly, since clinical net benefit depends on calibrated probabilities. Clarify the target population, noting that the cohort includes patients already diagnosed with heat-related illness and is predominantly male; mention this limitation and temper generalizability claims in the Discussion.

With these targeted updates, the manuscript will present a concise, clinically useful model with transparent methods and a realistic scope.

Reviewer 1 ·

Basic reporting

The revised manuscript complies with all specified requirements.

Experimental design

The research design of the manuscript is methodologically sound.

Validity of the findings

The research findings presented in the manuscript also satisfy the stipulated requirements.

Reviewer 2 ·

Basic reporting

.

Experimental design

.

Validity of the findings

.

Additional comments

I have carefully reviewed the revised manuscript. The authors have addressed all my previous concerns, and I am satisfied with the changes. I recommend the article for publication.

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

Dear Author, please consider carefully all the concerns in both review 1 and review 2. In particular, you have to address the “clinical concerns” of reviewer 1 in the rebuttal.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 ·

Basic reporting

-

Experimental design

-

Validity of the findings

-

Additional comments

I have carefully reviewed the author's definition of heat stroke, which is based on guidelines established in China. The author's research appears to focus on predicting severe heat stroke among all heat-related illnesses. In this context, my primary concern is whether clinicians diagnose severe heat stroke at the time of its onset. In military training and among athletes, if body temperature exceeds 40°C or neurological symptoms are present, it already satisfies the diagnostic criteria for severe heat stroke, indicating that additional blood parameters are not required for prediction. For other patients with severe heat stroke, if body temperature does not exceed 40°C and no neurological symptoms are observed upon admission, blood parameters may be necessary to predict severe heat stroke. However, according to the author's data extraction process, blood parameters were obtained after patient admission. The author diagnosed heat stroke based on these blood parameters. Yet, has the author considered the following question: once the blood parameter results become available, is the identification of heat stroke patients already complete? In this scenario, the blood parameters identified in the manuscript are not utilized to predict the risk of heat stroke occurrence but rather to confirm the diagnosis among all patients. If the author were to provide more detailed explanations regarding this issue in the manuscript, it would facilitate readers' comprehension of the clinical significance of this research. The machine learning method employed by the author in the study is appropriate and accurately applied. Furthermore, it is necessary to highlight that the author has made an inaccuracy in the application of references concerning the definition of outcomes for heat stroke. Specifically, the reference cited by the author does not correspond to the definition of heat stroke as established in China. Additionally, the reference used in the section on the diagnosis of heat-related illnesses is also inappropriate.

Reviewer 2 ·

Basic reporting

This paper presents a relevant and timely contribution, as heatstroke is a common and serious medical condition, especially in the context of climate change and the increasing frequency of extreme heat events. The topic is clearly important for public health, as heatstroke can lead to life-threatening situations and even death. For this reason, developing predictive tools to help anticipate such cases is of clear interest, not only for clinicians but also for healthcare systems more broadly.

I find the sample used in this study to be quite interesting and well-selected. The authors collected data from 24 different hospitals over a recent three-year period (2021–2023), which adds robustness to the analysis. Using both clinical and laboratory data, they applied and compared several machine learning models to predict the risk of heatstroke. This kind of multi-model approach allows for a more comprehensive understanding of the problem and gives value to the conclusions drawn.

Although there are already many papers applying machine learning to predict health-related events, I believe this work is still relevant. As the authors point out, these types of predictive models have not been widely implemented in real clinical settings, mainly due to the lack of comprehensive data and the absence of sufficiently advanced algorithms.

This study contributes to addressing that gap by combining a solid dataset with recent machine learning techniques, while also putting emphasis on model interpretability, which is crucial in healthcare applications.

Overall, I think this is a solid and well-structured study. I have only found four aspects that, in my opinion, would benefit from further clarification and are important to fully understand the work: (1) how the data matrix is constructed and what each row represents, (2) the definition and presentation of the outcome variable, (3) the distinction between training, test, and validation sets, and (4) the lack of information about the hyperparameters used in the different machine learning models. Apart from these points, the study is well conducted and the results are clearly presented.

Experimental design

One aspect that, in my opinion, needs further clarification is how the final data matrix used for modelling is constructed. The authors provide a detailed list of variables collected — including demographic, clinical, and laboratory data — and explain the inclusion and exclusion criteria. However, it is not clearly described what each row in the final dataset actually represents. From the sentence “This study’s main outcome was the number of all heatstroke cases upon admission”, it seems that each row corresponds to a single patient, presumably using data collected at the time of hospital admission. Still, this could be made more explicit. For example, it would help to confirm whether the input variables are all taken from the moment of admission, or if some are measured before or after. Also, the role of time in the dataset is not entirely clear: since data were collected over three years (2021–2023), it would be useful to understand how this temporal dimension is reflected in the structure of the data. Are patients from different years simply appended together as independent rows? Is the year or date itself included as a variable in the model?

Additionally, although it becomes clear from the use of performance metrics (such as AUROC) that the outcome is a binary variable — most likely indicating the presence or absence of heatstroke — the sentence referring to the outcome as “the number of all heatstroke cases upon admission” could be misleading. It may give the impression that the outcome is a count variable, rather than a classification label. Clarifying that the target variable is a binary indicator (e.g., heatstroke: yes/no) assigned per patient at admission would help readers better understand the modelling framework and how the predictions are generated.

Although from a full reading of the manuscript one can eventually deduce the intended structure, I believe it should be explained more clearly and explicitly, ideally in the sections on Variable extraction and Outcome definition, where readers naturally expect to find this kind of information.

Validity of the findings

-

Additional comments

Another point that I believe should be addressed is the terminology used to describe the data splits. Throughout the manuscript, the authors refer to a training set and a validation set, and it is stated that data from 2021 and 2022 were used for training, while data from 2023 were reserved for validation. However, in Table 1, the column is labelled as testing set, which creates some confusion. Based on the description provided, it seems clear that the data from 2023 were held out and only used to evaluate the final model, which is, in fact, the standard definition of a test set. In contrast, a validation set is typically used during the training process to tune hyperparameters and select the final model before testing.

Therefore, if the 2023 data were not used during model development but only to assess its final performance, then it would be more accurate and consistent to rename this split as the test set throughout the paper. Clarifying this distinction would improve the clarity and methodological transparency of the study.

In addition, I think it would be useful to include a table with the hyperparameters used for each of the machine learning algorithms tested. If hyperparameter tuning was performed — for example, using cross-validation or grid/random search — it should be briefly described in the methods section. And if no tuning was carried out, the authors should at least report the specific values or default settings used in each case. Since many of the models involved (e.g., gradient boosting, random forest, etc.) are sensitive to these parameters, providing this information is important for reproducibility and for properly interpreting the results.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.