Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on July 30th, 2024 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on October 3rd, 2024.
  • The first revision was submitted on October 24th, 2024 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on November 17th, 2024.

Version 0.2 (accepted)

· Nov 17, 2024 · Academic Editor

Accept

Dear Authors,

I am pleased to inform you that your manuscript titled "Identification of crucial extracellular genes as potential biomarkers in newly Diagnosed Type 1 Diabetes via Integrated Bioinformatics Analysis" has been accepted for publication.

Your thorough revisions and responses to the previous reviewers' comments have significantly improved the manuscript. The integrated bioinformatics approach you've employed to identify potential biomarkers for Type 1 Diabetes Mellitus is both innovative and relevant to the field.

Congratulations on your valuable contribution to the understanding of T1DM pathogenesis and the identification of potential biomarkers. We look forward to seeing your work in print.

[# PeerJ Staff Note - this decision was reviewed and approved by Celine Gallagher, a PeerJ Section Editor covering this Section #]

Reviewer 2 ·

Basic reporting

In this version Ming Gao, Qing Liu, Lingyu Zhang , Fatema Tabak, Yifei Hua, Wei Shao, Yangyang Li, Li Qian, Yu Liu presented the corrections on "Identification of Crucial Extracellular Genes as Potential Biomarkers in Newly Diagnosed Type 1 Diabetes via Integrated Bioinformatics Analysis".
The manuscript was notably improved addressing most the previous comments. However there are some previous comments that still need to be addressed before moving forward.

Experimental design

Line 143-144: The statement "The tissue source used for the microarrays was derived from human peripheral blood" needs to be clarified. Specify whether this information comes from plasma, red blood cells, etc.

Figure 2A: There is still no reference to the icon size. Even if mentioned in the rebuttal letter, this should be addressed.

Validity of the findings

No edits needed

Additional comments

Lines 320-321: The title might be confusing now; I suggest reverting it but retaining the term "verification" instead of "validation."

·

Basic reporting

The revised manuscript is clear, well-structured, and professionally written. The authors have addressed previous concerns related to background, clarity, and figure labeling, making the manuscript accessible for a broad scientific audience.

• English and Grammar: The authors have improved the language and readability, which enhances the clarity of their findings. There are minimal grammatical errors, and the text flows smoothly.
• Background and Literature: The authors have expanded the background to provide context for the role of extracellular proteins in NT1D, and they now emphasize the novelty and importance of their work in identifying biomarkers for early diagnosis.
• Figures and Data Presentation: The figures are now clearer, with additional labeling and more informative legends. Improved color schemes in the heatmap and network diagrams have enhanced data visualization, making it easier to interpret the differential expression of genes.

Experimental design

The study is original and fits well within the journal’s aims and scope, as it addresses a significant gap in NT1D diagnostics through bioinformatics.

• Research Question: The question of whether extracellular proteins can serve as biomarkers for NT1D is clearly defined, relevant, and addresses a crucial need for non-invasive diagnostic tools.
• Methodology: The authors have improved the methodological clarity, especially around data preprocessing steps, batch effect correction, and the rationale behind statistical filtering criteria. This enhances the reproducibility of the study.
• Sample Size and Validation: The study design includes validation through an independent dataset and clinical samples, which adds rigor to the findings. However, the authors appropriately acknowledge the limitation posed by the small sample size and recommend further validation in larger cohorts.

The methodology is well-documented, allowing replication of the analysis with the provided code and datasets. The authors have also discussed potential biases and the importance of more comprehensive data for future studies.

Validity of the findings

The findings are robust, statistically sound, and controlled, with all underlying data and code available for reproducibility. The authors have provided reasonable interpretations, limiting their conclusions to the supporting data.

• Data Interpretation: The authors expanded the discussion to explain the functional roles of key genes such as LCN2, IFNG, TNF, and MMP9, which strengthens the manuscript. Additionally, they referenced relevant experimental studies that support the potential role of these genes in NT1D pathogenesis.
• Limitations: The authors have acknowledged all primary limitations, including the sample size and relatively lenient filtering criteria. They discussed how these factors could lead to false positives, providing a balanced perspective on their findings.

The study contributes valuable insights into NT1D biomarkers and has laid a foundation for future experimental validation. The authors’ careful limitation of their conclusions enhances the study’s credibility.

Additional comments

This revised manuscript is substantially improved and meets the criteria for publication. The authors have addressed previous feedback with a constructive approach, enhancing the study’s impact and readability.

Version 0.1 (original submission)

· Oct 3, 2024 · Academic Editor

Major Revisions

Thank you for submitting your manuscript "Identification of Crucial Extracellular Genes as Potential Biomarkers in Newly Diagnosed Type 1 Diabetes via Integrated Bioinformatics Analysis" to our journal. We have received feedback from three reviewers, and based on their comments, I am recommending minor to moderate revisions before the manuscript can be considered for publication. Please address the following points in your revision:

Introduction and Rationale
1. Clearly state the rationale and hypothesis for the study, particularly explaining why extracellular proteins were targeted
2. Expand the literature review to include more recent studies on biomarker discovery in diabetes and discuss the limitations of current diagnostic tools for NT1D

Methods
1. Specify the tissue source used for the microarrays in the main text
2. Explain how the list of extracellular proteins was derived from the protein databases
3. Clarify methodologies, including the meaning of "de-duplication" and provide citations where appropriate
4. Describe the method used to adjust p-values
5. Include details on quality control procedures for the datasets
6. Implement and describe a multiple testing correction method in the differential expression analysis

Results and Figures
1. Improve figure clarity:
• Add color scale meanings to Figures 2c and 3d
• Explain varying symbol sizes in the PCA plot (Figure 2a)
• Enhance resolution of heatmaps and volcano plots
2. Address the potential outlier (Sample T1D number 1):
• Perform analyses such as skewness, kurtosis, and Mahalanobis distance
• Consider excluding it if confirmed as an outlier
3. Explain the rationale for including positive -log10(P-values) in Figure 5a

Discussion and Conclusions
1. Integrate results more deeply with the discussion, explaining the importance of identified genes in NT1D context
2. Discuss how the findings could translate into clinical practice, including potential hurdles and future research steps
3. Emphasize the preliminary nature of the findings and the need for further validation
4. Address study limitations more explicitly, including sample size and analytical threshold choices

Terminology and Language
1. Replace the term "validation" when referring to the study, as no clinical or experimental validations were performed
2. Improve clarity and polish sentence structure throughout the manuscript, especially in technical explanations

Code and Data
1. Enhance code documentation and annotations to improve reproducibility
2. Include batch effect correction in the provided code
3. Provide a brief explanation in the manuscript about the raw data files in the supplementary materials

Please submit a revised version of your manuscript addressing these points. Include a point-by-point response to the reviewers' comments along with your revised manuscript. We look forward to receiving your revised submission.

·

Basic reporting

While the manuscript demonstrates technical correctness and relevance, improvements in clarity, structure, and depth of analysis would elevate its quality. Strengthening the discussion of existing biomarkers, addressing limitations, and improving the presentation of figures and tables will further improve the overall impact of the study.

A. Some sections could benefit from improved clarity, especially where more nuanced technical explanations are provided. For example, the methods section contains several dense sentences that might be challenging for readers unfamiliar with bioinformatics.

B. There are a few instances where sentence structure could be more polished. For example, in the Results section, sentences explaining the relevance of particular genes could be shortened or clarified for ease of understanding.

C. The literature review is somewhat limited, especially regarding the discussion of existing biomarkers. Expanding on the limitations of current diagnostic tools for NT1D would provide a clearer rationale for the research.

D. The citation of more recent studies on biomarker discovery in diabetes could help ground the study within the latest scientific advancements.

E. The resolution of some figures, such as the heatmaps and volcano plots, could be improved. In their current form, they are somewhat difficult to interpret.

F. The raw data, while available, is not thoroughly described in the manuscript itself. It could benefit from more detailed explanations in the paper, so readers understand exactly what data is available and how to interpret it. Include a brief explanation in the manuscript about the raw data files provided in the supplementary materials, guiding readers on their relevance and use.

G. The results section could benefit from deeper integration with the Discussion. Some findings, such as the identification of specific genes, are presented without much contextual explanation about why these genes are particularly important in NT1D.

H. Address the study's limitations more explicitly, including potential biases introduced by sample size and the choice of analytical thresholds.

Experimental design

Overall, the study addresses a meaningful research question and is conducted with appropriate technical and ethical standards. However, there is room for improvement in articulating the novelty of the research, describing quality control measures, and ensuring reproducibility through better documentation of the methods and code.

A. While the study’s research question is meaningful, the knowledge gap could be articulated more explicitly. For instance, as mentioned above, the authors mention the lack of current biomarkers but do not deeply explore the limitations of existing diagnostic methods or how their approach significantly improves upon them. The paper would benefit from a more detailed discussion on the novelty of the extracellular protein focus and how it diverges from previous biomarker studies in NT1D.

B. The validation using clinical samples involves a relatively small sample size (6 NT1D and 6 healthy controls). This limits the robustness of the validation results and could introduce sampling bias. This weakens the strength of the conclusions drawn from this aspect of the study.

C. The study does not mention any efforts to ensure the quality or authenticity of the publicly available datasets used, such as discussing potential biases or limitations in the datasets. Include a brief discussion of how data quality was ensured for the GEO datasets and address potential biases or limitations of these public datasets.

D. Some steps in the bioinformatics pipeline could be better detailed. For instance, the process of correcting for batch effects is mentioned in the manuscript but not explicitly covered in the methods section of the provided code, leaving ambiguity about how this correction was implemented.Ensure that every step described in the methods is reflected in the code, particularly the batch effect correction, as this step is critical to avoid technical bias.

E. The paper does not explicitly mention whether the datasets were checked for outliers or inconsistencies before analysis. Quality control steps should be mentioned to ensure that the results are reliable. Clarify any quality control procedures used to ensure the accuracy and reliability of the dataset before and after processing.

F. While the code for the analysis is provided, it would benefit from clearer annotations and additional documentation, especially for users unfamiliar with the specific tools or packages used. Improve the documentation of the code to make it more accessible to other researchers, ensuring that each step is thoroughly explained and reproducible.

Validity of the findings

The study is methodologically sound and offers a meaningful contribution to biomarker discovery in NT1D, with publicly available datasets and transparent reporting of results. While the data is statistically sound, improvements could be made in controlling for multiple testing and expanding the clinical validation cohort. The conclusions are appropriately tied to the original research question, though a more cautious tone regarding the clinical application of these biomarkers would strengthen the paper.

A. While the authors provide statistical cutoffs (e.g., |log2FC| > 0.5 and p < 0.05), there is no mention of controlling for multiple comparisons. In studies involving large-scale gene expression data, controlling for multiple testing is crucial to avoid false positives. Implement a multiple testing correction method in the differential expression analysis to increase the robustness of the results and reduce the likelihood of false positives or clarify in the methods section that multiple testing correction has been made and how.

B. The authors suggest that the identified genes “may serve as potential diagnostic biomarkers for NT1D,” which, while reasonable, is slightly premature given the small size of the clinical validation cohort. A stronger emphasis on the preliminary nature of these findings and the need for further validation would provide a more balanced perspective. Emphasize the preliminary nature of the findings and the need for further validation studies before these biomarkers can be adopted in clinical practice. This would provide a more cautious and balanced conclusion.

C. The discussion lacks a detailed exploration of how these findings could translate into clinical practice, which weakens the overall impact of the conclusions. While the identified biomarkers are promising, more discussion on how these could be used in diagnostics or therapy is needed. Include a more detailed discussion of how the identified biomarkers could be used in clinical diagnostics, including potential hurdles to translation into practice and future steps for research.

Additional comments

The peerj-103851-code.R script appears to be well-structured and follows a typical workflow for processing microarray data obtained from the GEO database. Several potential issues or problems are listed below.

A. The script performs a log2 transformation of the expression data. However, it assumes that the data is not already log-transformed, which might not always be the case. It would be prudent to include a check to see if the data needs transformation.

B. The combat function for batch effect correction is mentioned in the paper, but the script provided does not include this step. This could be a significant omission if batch effects are present in the data and downstream analysis.

C. The script could benefit from more explicit documentation regarding the annotation process, especially for users unfamiliar with the specific microarray platform.

D. The script does not include any code for multiple testing correction, which is critical in genomic studies to control the false discovery rate (FDR). This omission could lead to a higher rate of false positives in the reported DEGs.

E. The filtering criteria (e.g., |log2FC| > 0.5 and p < 0.05) are somewhat arbitrary. A more sophisticated approach, like using an FDR threshold, would improve the robustness of the findings.

F. Some comments are too brief and do not fully explain the purpose of more complex steps. For instance, the rationale behind the specific parameters chosen for differential expression analysis (e.g., the log2FC cutoff) could be better explained.

Reviewer 2 ·

Basic reporting

Overall, the manuscript will benefits from scientific and grammatical improvements.
Introduction
1. The introduction lacks both a clear rationale and a hypothesis. For example, the reasoning behind targeting extracellular proteins in this study needs to be explicitly stated, as this forms the basis of the study.

Experimental design

Methods
2. The tissue source used for the microarrays must be clearly specified in the main text.
3. While the authors have declared the sources of the protein databases, they have not explained how the list of extracellular proteins was derived from these databases.
4. Some sentences are unclear and do not adequately explain the methodologies used. For instance, lines 96-98 need to clarify what "de-duplication" means, and if possible, provide citations to support their methodological procedures.
5. The method used to adjust the p-value needs to be clarified within the Methods section.
6. The term "validation" is used multiple times to refer to the study, which is misleading as no clinical or experimental validations were performed. This needs to be corrected.

Validity of the findings

7. There are errors in the figures, such as the absence of a color scale meaning in Figures 2c and 3d. Additionally, the PCA plot in Figure 2a shows symbols of varying sizes, which should be explained.
8. Sample T1D number 1 appears to be a potential outlier based on the heatmaps. The manuscript could benefit from demonstrating whether this sample is indeed an outlier through analyses such as skewness, kurtosis, and Mahalanobis distance, or from excluding it if confirmed as an outlier.
9. The rationale for including positive -log10(P-values) in Figure 5a should be explained.

Additional comments

Ming Gao, Qing Liu, Lingyu Zhang , Fatema Tabak, Yifei Hua, Wei Shao, Yangyang Li, Li Qian, Yu
Liu presented the "Identification of Crucial Extracellular Genes as Potential Biomarkers in Newly Diagnosed Type 1 Diabetes via Integrated Bioinformatics Analysis" presents valuable insights, particularly in identifying potential genes related to early T1D detection. While the reported genes appear promising, there are significant concerns that need to be addressed, both in terms of scientific content and the structure and clarity of the paper. Overall, the manuscript will benefits from scientific and grammatical improvements.
Major comments
Introduction
1. The introduction lacks both a clear rationale and a hypothesis. For example, the reasoning behind targeting extracellular proteins in this study needs to be explicitly stated, as this forms the basis of the study.
Methods
2. The tissue source used for the microarrays must be clearly specified in the main text.
3. While the authors have declared the sources of the protein databases, they have not explained how the list of extracellular proteins was derived from these databases.
4. Some sentences are unclear and do not adequately explain the methodologies used. For instance, lines 96-98 need to clarify what "de-duplication" means, and if possible, provide citations to support their methodological procedures.
5. The method used to adjust the p-value needs to be clarified within the Methods section.
6. The term "validation" is used multiple times to refer to the study, which is misleading as no clinical or experimental validations were performed. This needs to be corrected.
Results
7. There are errors in the figures, such as the absence of a color scale meaning in Figures 2c and 3d. Additionally, the PCA plot in Figure 2a shows symbols of varying sizes, which should be explained.
8. Sample T1D number 1 appears to be a potential outlier based on the heatmaps. The manuscript could benefit from demonstrating whether this sample is indeed an outlier through analyses such as skewness, kurtosis, and Mahalanobis distance, or from excluding it if confirmed as an outlier.
9. The rationale for including positive -log10(P-values) in Figure 5a should be explained.

·

Basic reporting

**Summary**:
The authors present a study aimed at identifying extracellular proteins as potential biomarkers for newly diagnosed Type 1 diabetes (NT1D) using bioinformatics approaches. The study utilized publicly available datasets (GSE55098, GSE33440) and tools such as GO and KEGG for functional enrichment analyses. The authors identified nine key extracellular-related genes, with particular emphasis on LCN2, IFNG, TNF, and MMP9 as potential biomarkers for NT1D.

**Strengths of the Study**:
1. **Relevance and Novelty**: The study addresses a critical aspect of early diagnosis in NT1D by proposing extracellular biomarkers, which could offer non-invasive diagnostic tools.
2. **Methodological Rigor**: The use of multiple databases (GEO, STRING, Uniprot, CTD) and bioinformatic tools such as Cytoscape and R enriches the analysis and strengthens the findings.
3. **Data Validation**: The study includes validation through another independent dataset (GSE33440) and clinical samples, which adds reliability to the results.
4. **Potential Clinical Impact**: Identification of extracellular proteins as biomarkers can lead to improved NT1D diagnostics and potentially novel therapeutic targets, enhancing early intervention strategies.

Experimental design

**Limitations and Areas for Improvement**:
1. **Sample Size**: The dataset used in this study includes only 12 NT1D and 10 healthy individuals (GSE55098), and 16 NT1D with 6 healthy controls (GSE33440). While bioinformatics analyses are valuable for hypothesis generation, the small sample size limits the generalizability of the results. The authors should mention this limitation explicitly and recommend further validation in larger cohorts.

2. **Statistical Analysis and Filtering Criteria**: The study relies on relatively loose statistical filtering criteria (adjusted p-value of 0.05 and |log2FC| > 0.5). This could result in false positives. More stringent criteria should be considered, or at the very least, the authors should acknowledge this limitation and discuss potential implications.

3. **Biological Validation**: Although the authors conducted validation with an additional dataset and clinical samples, the study would benefit greatly from functional validation of the identified biomarkers. In vitro or in vivo experiments to confirm the biological roles of LCN2, IFNG, TNF, and MMP9 in NT1D pathogenesis would add substantial weight to the conclusions.

4. **Functional Implications of Key Genes**: While the authors identify LCN2, IFNG, TNF, and MMP9 as key biomarkers, their functional role in NT1D remains speculative based on bioinformatics predictions. A more detailed discussion on how these genes mechanistically contribute to NT1D could strengthen the manuscript. Additionally, referencing more experimental studies on these genes’ roles in diabetes would improve the background and discussion sections.

5. **Figures and Data Presentation**: While the figures are informative, some require clearer labeling, especially in the network and heatmap plots. The color scheme in Figure 2 could be adjusted for better clarity, as the current representation makes it hard to distinguish up- and down-regulated genes. Moreover, the authors should ensure that all figures are self-explanatory, with detailed legends describing what is represented.

6. **Methodology Clarification**: The methods section, while comprehensive, lacks clarity in some areas. For instance, the explanation of the “combat” function used for batch effect correction could be elaborated to ensure clarity for non-specialists. Additionally, the rationale behind using the selected thresholds for differentially expressed genes (DEGs) should be provided.

**Minor Issues**:
1. **English Language and Grammar**: While the manuscript is generally well-written, there are some minor grammatical errors and awkward phrasings. A thorough proofreading or professional editing would improve readability and ensure the manuscript is clear for an international audience.

2. **Ethical Considerations**: The authors mention ethical approval for clinical sample collection, but further details on the study population, such as inclusion/exclusion criteria, should be added for transparency.

Validity of the findings

No comment

Additional comments

**Overall Recommendation**:
The manuscript provides an interesting bioinformatic analysis with potential implications for NT1D diagnostics. However, to strengthen the impact of the study, the authors should:
- Acknowledge the limitations posed by sample size and filtering criteria,
- Incorporate functional validation of key findings,
- Improve figure clarity, and
- Elaborate on the biological implications of the identified biomarkers.

I recommend **minor to moderate revisions** before this manuscript can be considered for publication.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.