All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
I am proceeding with this work to the next stage. congratulations.
Dear authors,
Thank you for your revisions.
There are still a number of aspects that should be clarified in the manuscript itself:
- the methodology is somewhat incomplete and confusing. eg. ln 140-142, "Data was collected from January 2023 to June 2024 based on criteria such as TPE diagnosis and laboratory results (ADA, TB-Ab). The data was collected by trained medical staff to ensure accuracy and consistency, following standardized procedures to ensure reliability." - from 31 Jan to 31 Jun, from 1 Jan to 1 Jun? . what standardized procedures? standardized by whom? where?
below, data sources ln 149 you say that "Data collection spanned from January
149 2021 to September 2024"; somewhat confusing...
the listing of laboratory parameters included should be completed (ln 154).
Altogether, methodology should be proofread to ensure reproducibility conditions in case others wish to reproduce your analysis ie, completeness is needed.
The figure legends should include a short paragraph on "take-home" messages from that figure...
424 with TPE / 539 non-TPE controls... then ln 225-230 you mention 763 patients, and independent validation cohort of 200 patients... make sure the text is clear to understand where all the numbers came from... FIg2 (at least) legibility is very low - check peerj instructions for that. i cannot read the legend nor the graphs, to be honest. no R2 for AUCs, or actual AUC value clearly depicted?
fig6 quality/resolution is also very low.
So, before i can accept your manuscript this proofreading and revisions need to be conducted. Also, be mindful that some of the explanations given to the reviewers should have been incorporated in the manuscript too.
Many thanks in advance for your work.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
All comments are included in detail in the last section.
All comments are included in detail in the last section.
All comments are included in detail in the last section.
Thanks for the revision. Both the responses to comments and the relevant changes to the paper are appropriate. Best regards.
Dear authors,
Thank you for your submission. Presently, it requires significant revisions before proceeding. Please, refer to the reviewers' comments for further details.
Strengths:
- The manuscript is written in clear, professional English with good overall structure
- The introduction provides adequate background context and rationale for the study
- Figures and tables are generally well-designed and informative
- Statistical methods and results are presented comprehensively
Areas for improvement:
- Some sections of the methods could benefit from additional detail, particularly regarding the patient selection criteria and data collection procedures
- The abbreviations list should be moved to appear earlier in the manuscript, so readers can understand easily the first part of the paper
- A few figures (especially Fig. 2 and 6 and 7) would benefit from improved resolution and clearer labeling
- You can introduce LGBM and Shapley explainability for people without any DS/AI context, maybe in plain text
- The novelty of this work compared to previous ML approaches in TPE diagnosis should be more explicitly stated
- Include more recent references, particularly from 2022-2024 on ML applications in medical diagnostics
- Discuss potential biases in th training data
Strengths:
-Clear research objective to develop an interpretable ML model for TPE diagnosis
-Good sample size (963 patients) with appropriate training/validation split
-Comprehensive comparison of multiple ML algorithms
-Inclusion of both internal and external validation
Areas for improvement:
-The rationale for selecting the specific 18 laboratory parameters could be better explained
-More details needed on how missing data was handled beyond mean imputation ( given that mean imputation may create additional data bias)
-The exclusion criteria could be more explicitly defined
-The time period difference between training and validation cohorts (2021-2024 vs. 2024) needs more justification
- Need clearer justification for why LGBM was chosen over other popular algorithms like XGBoost... . This statement about model selection """Model Development: Ten machine learning models, including LGBM, Gradient Boosting Decision Trees
182 (GBDT), and Support Vector Machines (SVM), were developed using data from the training and testing cohorts.
183 The LGBM model achieved the highest predictive accuracy. SHAP analysis enhanced interpretability,
184 identifying 11 key features (e.g., ADA, TB-Ab, protein) crucial to the model's performance.""" should be detailed. "Accuracy" is not the right metric to select the best model in classification problems with imbalanced datasets. I know that later, in section "2. Model Development and Performance Comparison" you used AUC to compare the tested models, but you did not discuss the hyperparameters tuning for all these models, maybe by tuning hyperparameters of other models, the performance may exceed LGBM performance.
- Add correlation matrices for selected features
- Add partial dependence plots for key features
Strengths:
-Robust statistical analysis with appropriate metrics
- Good model performance metrics (AUC >0.92 in validation)
-Effective use of SHAP analysis for interpretability
-Clear presentation of results with appropriate statistical tests
Areas for improvement:
-The external validation cohort (n=200) is relatively small compared to the training cohort
-Limited discussion of potential confounding factors
-More discussion needed on the model's limitations and potential biases
-Comparison with existing diagnostic methods could be expanded
- Provide detailed error analysis of misclassified cases
- Provide confusion matrices for all validation sets
The study represents a valuable contribution to the field by developing an interpretable ML model for TPE diagnosis. The use of routine laboratory parameters makes it particularly practical for clinical implementation.
The methods are generally sound, though some aspects require additional detail:
-Clarify the exact process of feature selection
-Provide more information about the data preprocessing steps
-Explain how the optimal number of features was determined
The results are well-presented, but some aspects could be strengthened:
-Include confidence intervals for key performance metrics
-Provide more detailed analysis of cases where the model performed poorly
-Compare performance with traditional diagnostic methods
Recommendations for improvement:
-Add a flow diagram showing patient selection and exclusion
-Include a more detailed discussion of clinical applicability
-Strengthen the limitations section
-Consider adding a cost-benefit analysis of implementing the model
- Add detailed protocol for data collection
All comments are included in detail in the last section.
All comments are included in detail in the last section.
All comments are included in detail in the last section.
Review Report for PeerJ
(Interpretable noninvasive diagnosis of tuberculous pleural effusion using LGBM and SHAP: Development and clinical application of a machine learning model)
1. Within the scope of the study, studies on the diagnosis of tuberculous pleural effusion disease were carried out using various machine learning methods.
2. In the introduction, Tuberculosis and the importance of the subject were sufficiently mentioned. However, the literature section is very limited. It is suggested to add a literature table consisting of columns such as the dataset used, method, originality, positive aspects, negative aspects, and results.
3. In the study, instead of an open source previously used dataset, data obtained from Zhejiang Provincial People's Hospital and Jiashan County First People's Hospital were used as the dataset. For this reason, the fact that the dataset was specific to the study increased the quality of the study.
4. It was positive that the dataset was not used raw, but passed through preprocessing steps, thus making the dataset more ready for use in machine learning models.
5. As a machine learning model, a total of 10 models such as Decision Trees and Adaptive Boosting were preferred in the study. Although the uses are suitable in terms of the number of models, it is clearer why these models are preferred in the literature.
6. The types and number of evaluation metrics obtained are acceptable and sufficient for the analysis of the results.
7. When the results are examined in detail, it is observed that they are sufficient for the solution of the problem.
As a result, the study is important in terms of the use of machine learning models in the diagnosis of tuberculous pleural effusion. However, attention should be paid to the above sections.
1. Clear and Professional English
The article uses professional and precise English throughout. However, some sections (e.g., the introduction and discussions) could benefit from slight rephrasing to enhance readability for a broader audience, as the complexity may hinder clarity for non-specialist readers.
2. Literature References and Background
The literature review is well-referenced and provides sufficient background to establish the context of the study. Key citations (e.g., references to LGBM, SHAP, and prior diagnostic methods) are appropriate and relevant. However, there are areas where the discussion could include broader comparisons to alternative machine-learning techniques.
3. Article Structure
The structure follows professional norms, including:
Abstract: Clear and concise.
Introduction: Well-contextualized with references.
Methods: Comprehensive and detailed, allowing replication.
Results: Data is well-presented with clear figures and tables.
Discussion and Conclusion: Linked to the research objectives.
Figures and Tables: High quality, well-labeled, and relevant. Figures such as ROC curves, SHAP visualizations, and PCA plots enhance the interpretability of the findings.
Raw Data: Provided and robust, supporting the reproducibility of results.
4. Hypotheses and Results
The article is self-contained and provides relevant results that directly address the hypotheses. Key findings (e.g., LGBM's superiority in performance and interpretability) are supported by statistical metrics like AUC, sensitivity, and specificity.
5. Areas for Improvement
Language and Clarity: While professional, some technical terms could be simplified or briefly explained for broader accessibility.
Generalizability: The discussion highlights limitations, such as sample size and geographic scope, but could further elaborate on how these limitations might be mitigated in future studies.
Ethical Considerations: Human ethics approval is appropriately mentioned, but a clearer elaboration of how ethical principles were upheld in data handling would strengthen the article.
Applications Section: The practical clinical application section could include additional details on integrating the LGBM model into real-world systems.
1. Original Research and Scope
The study aligns with the journal's scope by addressing the need for an interpretable, noninvasive diagnostic tool for tuberculous pleural effusion (TPE) using LGBM and SHAP, filling a critical knowledge gap in clinical diagnostics.
2. Research Question
The research question is well-defined, relevant, and meaningful. It addresses diagnostic challenges in resource-limited settings and provides a practical solution for improving TPE diagnosis.
3. Technical and Ethical Standards
Rigorous Investigation: A multicenter prospective study, robust validation (internal, external, cross-validation), and advanced metrics (AUC, sensitivity, specificity) ensure reliability.
Ethical Compliance: Ethical approvals and informed consent waivers are explicitly stated, though details on patient confidentiality could be elaborated.
4. Methodology
The methodology is thorough and replicable, with clear descriptions of study design, data preprocessing, feature selection, and machine learning techniques. Statistical methods and SHAP integration are well-documented but could benefit from more detail on raw data access and SHAP's clinical implementation.
5. Recommendations
Expand sample size and geographic diversity for broader validation.
Ensure raw data availability for replication.
Provide practical examples of SHAP's use in clinical settings.
Clarify patient confidentiality measures
1. Impact and Novelty
While the article does not explicitly assess its broader impact or novelty, it contributes meaningfully to the literature by addressing the lack of noninvasive and interpretable diagnostic tools for TPE. The rationale for the study and its potential benefit to clinical practice are clearly stated, encouraging meaningful replication.
2. Data Quality
The study provides robust underlying data with clear documentation of data sources, preprocessing, and validation methods. Statistical analyses are sound, controlled, and presented in detail. However, explicit information on data sharing or public access could strengthen the study’s transparency and replicability.
3. Conclusions
The conclusions are well-stated and directly linked to the research question, supported by the results. The discussion remains focused on the study's findings, avoiding overgeneralization. It also highlights limitations, such as sample size and geographic scope, and outlines future research directions clearly.
Recommendations
Emphasize the study's broader implications for clinical and resource-limited settings.
Provide explicit details on data accessibility for replication purposes.
Include a brief assessment of how this work advances existing diagnostic methods beyond the provided metrics.
1. Handling Class Imbalance
Although LGBM inherently addresses class imbalance through Gradient-based One-Sided Sampling (GOSS), the authors could explicitly state whether they tried additional techniques, such as:
Oversampling (e.g., SMOTE or ADASYN) to balance the minority class.
Class-weight adjustments in the loss function for further robustness.
2. Feature Engineering
The authors excluded some clinically significant features (e.g., lymphocytes, macrophages) to avoid overfitting. While justified, techniques like regularization (e.g., L1/L2 penalties) or dimensionality reduction methods (e.g., PCA) could have been explored to retain these features while mitigating overfitting
3. Data Preprocessing
Missing Values: The study mentions mean imputation for missing values. Exploring more advanced imputation techniques (e.g., multiple imputations, model-based imputations) might provide better handling of missing data, especially if the missingness is not random.
4. SHAP Integration
While SHAP is well-applied, the authors could have included:
Interaction effects: Analyzing interactions between features to uncover additional insights.
Explainability at scale: Demonstrating how SHAP can guide clinicians in interpreting predictions for complex cases in real time.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.