All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Dear authors,
Thank you for addressing the reviewers comments. Your manuscript is now acceptable for publication.
As you will see Reviewer 3 left a couple of non-mandatory suggestions that you can incorporate in your final version, if you wish.
Thank you for submitting your work to PeerJ.
The authors have provided an excellent and thoroughly considered response to the reviewers’ comments. This is, without doubt, one of the most honest, detailed, and comprehensive replies I have had the opportunity to evaluate. They have addressed nearly all—if not all—comments raised by the three reviewers, including the optional suggestions I had proposed in the first round.
Moreover, the clarity and structure of their responses made the review process exceptionally smooth. The revisions were easy to follow, which rendered this second review both efficient and constructive.
No comment
No comment
I have two additional remarks. These are not mandatory for inclusion in the final version of the manuscript but are rather open questions and suggestions for further reflection. I would be particularly interested in the authors’ perspectives or feedback, either as part of a future discussion or in the context of ongoing developments:
• Clarification on Causality in Gaussian Bayesian Networks
> The authors rightly state that “Gaussian Bayesian networks are powerful in modelling probabilistic dependencies (associations), but edges in the DAG do not automatically imply causation.”
While this assertion is generally valid, I would like to nuance it slightly. Consider three nodes A, B, and C. In the structures A→B→C, A←B←C, and A←B→C, the joint probability distributions are Markov equivalent and cannot be distinguished based solely on observational data. However, the fourth possible configuration, known as a v-structure (A→B←C), is distinguishable probabilistically, and allows for partial edge orientation—even in the absence of interventional data—under the assumption of no latent confounders.
. More broadly, I am highly interested in the potential links between structural equation models (SEMs), (Gaussian) Bayesian networks, and causal graphical models. While these approaches stem from distinct theoretical backgrounds, I believe they are complementary and could benefit from deeper integration. To this end, one potential entry point is:
“Integration of Structural Equation Modeling and Bayesian Networks in the Context of Causal Inference: A Case Study on Personal Positive Youth Development.”
• Edge Orientation and Interventional Design
> The discussion on the current limitations in orienting edges and distinguishing causal directionality is highly relevant, and I agree that this remains an underdeveloped research area.
In particular:
- With access to interventional data—such as gene knock-out experiments, e.g. mediated by CRISPR design—edge orientation becomes feasible in principle. A hybrid strategy could be envisioned: First, orient edges approximately using inference-based approaches drawn from the Gaussian Bayesian network (GBN) literature; Then, refine directionality using structural modeling or SEM-based techniques ->
This raises an open methodological question:
What is the minimal or most efficient experimental design required to causally orient edges that show only low-confidence directionality in purely observational models?
related to this context, I recommend the following reference: “Joint estimation of causal effects from observational and intervention gene expression data.”
More broadly, I believe this challenge relates to the design of optimal perturbation strategies, which is central in recent work on foundational omic models. In my view, there remains a considerable and unfortunate disconnect between causal inference theorists and developers of trendy omic foundational models. Bridging this gap could open up transformative opportunities. Notable recent works include:
- “Causal machine learning for single-cell genomics.”
- “Benchmarking foundation cell models for post-perturbation RNA-seq prediction.”
Again, these remarks are offered in the spirit of constructive dialogue and potential future exploration. I thank the authors again for their thoughtful and thorough revisions.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
Some sections (e.g., "Phosphorylation and Causal Inference") are dense. Simplify jargon for broader accessibility.
Ensure consistency in abbreviations (e.g., define "NDEGs" upon first use).
P. 13: "demystify SEM for biologists by combining a series of analyses" → "demystifies".
P. 25: "Future study can expand" → "Future studies".
Check hyperlinks (e.g., GEO database link on p. 29 is truncated).
Figure 4 (Fc gamma R-mediated phagocytosis pathway): Label edges with interaction types (e.g., activation/inhibition) if possible.
Figure 7 (Model Invariance): Clarify why fit_edge performs poorly (e.g., overfitting?).
Update citations to reflect recent SEM advancements (e.g., Grassi & Tarantino, 2025).
The manuscript should explicitly state assumptions about data preprocessing (e.g., normalization for RNA-seq/microarray) and SEM requirements (e.g., sample size, missing data handling).
Address potential biases from pathway databases (e.g., KEGG) and how the tool mitigates them.
Provide more details on installation/execution (e.g., R package dependencies, system requirements).
Offer a minimal example dataset in the supplementary materials for quick testing.
Benchmark ShinyDegSEM against existing tools (e.g., SEMgraph, GenomicSEM) in terms of accuracy, speed, and usability.
Include a table summarizing key features/advantages over alternatives.
Dependence on predefined pathways (may miss novel interactions).
Challenges in interpreting bidirectional edges biologically.
Computational scalability for large datasets (e.g., single-cell RNA-seq).
Clarify the poor performance of fit_edge in Figure 7 (e.g., overfitting, sample size constraints).
All comments have been addressed in detail in the final section.
All comments have been addressed in detail in the final section.
All comments have been addressed in detail in the final section.
Peer Review Report for PeerJ
(ShinyDegSEM: an interactive application for pathway perturbation analysis in gene expression studies via structural equation modeling)
1. In this study, an interactive R Shiny application called ShinyDegSEM, based on structural equation modeling (SEM), was developed to support the understanding of mechanisms underlying phenotypic variations, and it was reported that the tool integrates differential gene expression analysis, perturbed pathway identification, and model comparison in a unified framework.
2. In the introduction, biological networks, the importance of the subject, contribution to the literature, and contribution to the literature are mentioned. In this section, since the literature review is very limited, it is recommended to be detailed.
3. When a detailed examination is made in relation to the ShinyDegSEM application developed within the scope of the study, it is observed that it is at a certain level in terms of both contribution to the literature, originality, and usability.
4. In the scope of the study, The Fc gamma R-mediated phagocytosis pathway is given a detailed place. In addition, both Initial structural equation modeling and Final structural equation modeling are clearly stated.
5. When the obtained results are examined and compared with the literature, it is observed that the node analysis result and part of the edge analysis result are sufficient and very suitable for the study.
As a result, the study can make very important contributions to the literature with the proposed application. Attention should be paid to the above parts.
The manuscript presents ShinyDegSEM, an interactive application developed using R Shiny for conducting pathway perturbation analysis through structural equation modeling (SEM) in gene expression studies. The tool is aimed at assisting researchers—particularly biologists and bioinformaticians—in the semi-supervised construction and evaluation of complex gene regulatory networks.
The analytical workflow is designed to be iterative and user-guided: it begins with the identification of differ-entially expressed genes (DEGs), proceeds to the construction of highly interconnected gene modules (re-ferred to as pathways), and culminates in the detection of condition-specific perturbations in these path-ways. The integration of SEM is particularly valuable here, as it enables the inference and testing of causal relationships between genes—an approach that moves beyond correlation-based analyses.
By embedding this capability within an interactive interface, the tool significantly lowers the technical barrier to applying SEM in transcriptomic studies. This could broaden access to powerful modelling techniques and accelerate hypothesis generation and validation in molecular biology.
With further refinements—especially in clarifying complex concepts, strengthening usability, and outlining future enhancements—the manuscript has strong potential to serve as a valuable resource for both novice and experienced researchers working in systems biology and transcriptomics.
General structure comments:
The manuscript would benefit from a clearer and more standardized structural layout, distinctly separating the Introduction, Methods, Results, and Conclusions sections. In particular, the Methods section should be expanded to include:
• A technical subsection, providing an overview of the SEM methodology for a broader audience, including theoretical background and key SEM-related terminology.
• A detailed, step-by-step description of the SEM-based analysis protocol (the current five-stage pipeline), ideally complemented by figures or flowcharts for clarity.
Additionally, some redundancies exist between sections (as noted in the annotated PDF), and these should be consolidated to improve overall coherence and conciseness.
Finally, any content describing the graphical user interface or front-end features of the application would be better placed in the Supplementary Material or provided as a vignette or walkthrough in the associated software repository. This separation would maintain the focus of the main manuscript on the conceptual and methodological contributions, while still documenting practical usage for interested users. With all these elements considered (avoiding redundancies between sections, relocating practical Use Cases of the application to a R vignette, …), the size of the final paper should not exceed 15 pages (currently 20 pages)
• Introduction:
o lengthy corpus to describe biological networks, but limited description of SEM underlying model. Besides, the section suffers overall from reference overload, consider reducing it to improve readability, and only integrate the most impactful/recent/more closely-related bibliographic references.
o On the other hand, lack of references describing other causal-based graphical representations, including, without exhaustivity, Graphical Causal Models (Structural Causal Models, Gaussian Bayesian networks), Instrumental variables with Mendelian Randomisation, Counterfactual approaches. Report to Figure 1, for details. A brief discussion on the respective pros and cons of each of these methods would be beneficial to the scope of the paper, including reasons why you focus on SEM approaches.
• Material and Methods:
o You mention that your ShinyDegSEM pipeline integrates multi-modal data sources, from transcriptomic to proteomic, and pathway information. Please further detail what is the biological interesting of integrating each source of omic (role in the pipeline), and which analytical choices are you considering, with respect to the kind of omic integrated.
o In step 1 of Materials and Methods, you mention that three types of genomic data, including gene expression (micro-array, RNASeq, …), qRT-PCR and genomic variation could be incorporated. A discussion on their core differences (in terms of distribution, known limitations, biological purposes, …) would benefit to the paper, along with a minimal list of guidelines for analysing them.
• Results:
o Adds infographic, or further clarify what you refer to as configural, edge or node invariances, in the Invariance Evaluation subsection.
o Detail how we could best select edges to tailor the model to observed data, instead of random picks and trimming.
o What are the main differences between your approach, and the most closely related R package developed, namely SEMgraph?
• Code availability: consider upload the Shiny application on GitHub, and not only on a private URL server.
o No licence provided with the code, please consider adding a MIT licence (or any standard licence that would fit)
o Consider deploy a web version of your ShinyApp to make it publicly available, especially for layman audience -> your targeted users will be biologists, as you mentioned in your corpus, so downloading a .zip file, and running R scripts (and possibly dealing with missing dependencies), before even playing with the application, is not user-friendly
o Consider enhanced functional modularity, avoiding concatenating all functions in ui.R and server. R files
o Consider developing your ShinyApp as a R package (also for the general organisation of the folder, here, every file is on the same level). Consider following Sebastian’s R blueprint for enhanced reproducibility, for instance: https://www.linkedin.com/posts/sebastian-rauschert-836760a0_bioinformatics-reproducibility-docker-activity-7337990323719024640-2xba/
o When you will deploy a publicly available web-server of your ShinyApp, consider any legacy issues, if end-users download themselves their datasets.
• No perspectives (or truly limited)
o Limited scope of transcriptomic data used -> extend to at least single-cell RNASeq experiment, microarray is not that popular anymore.
o Include a more detailed roadmap for future versions of ShinyDegSEM. For example, will the tool eventually be able to handle time-series/dynamic networks? Integration of multi-omics?
o Scalability of the application with the size of the network (aka computational issues, and latency)?
• Supplementary Materials
o Prefer a .tex format, with labelled (and clickable) references to equations, rather than .docx documents
o I would have incorporated this section, notably the theoretical details and the description of the SRM model, into the “Methods” section of Shiny DegSem.
o The SEM model would have benefited from a DAG representation describing the differences between observed and latent variables, along with definition of parents and siblings.
o Explain the rationale of considering only observed and variable relationships in your SEM model, contrary to the conventional, two-layered SEM model.
o Is it possible to provide prior weights on your edges? Or to incorporate several pathway databases information, instead of Kegg (report to Omnipath for details)
o Justify your hard thresholds in section Tests and Model fit, notably explain why you choose different cut-offs depending on the measure goodness of fit.
No comment
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.