Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on April 25th, 2025 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on July 16th, 2025.
  • The first revision was submitted on September 1st, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on October 1st, 2025.

Version 0.2 (accepted)

· · Academic Editor

Accept

The authors addressed the main requests of the reviewers and therefore I can recommend this article for acceptance.

[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]

Reviewer 1 ·

Basic reporting

-

Experimental design

-

Validity of the findings

-

Additional comments

-

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 ·

Basic reporting

Its well written and logical
Use of SHAP in recent work can be mentioned.
Is this raw data shared because of the journal's request ?. I suggest not to share, else anyone can use the data. If its a publicly available dataset, that's okay


**PeerJ Staff Note:** It is PeerJ policy that all data needed to reach the conclusions must be shared for review and made public at the time of publication

Experimental design

Adequate

Validity of the findings

Discussion can be more elaborated
GAN approach is bit unclear, specially, why you use it ?
I have a doubt whether GAN approach can be used for such small dataset ? please clarify
Figures are very clear and neat , well done

Additional comments

-

Reviewer 2 ·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

Estimating Compressive Strength of CO₂ Incorporated Concrete with Data Augmentation and Explainable Regression Modeling
Manuscript ID:117902
Recommendation: Major Revision

The study is original, within PeerJ Computer Science’s scope, and addresses a relevant question: predicting CO₂-incorporated concrete strength using ML with limited data. The experimental setup (e.g., OPC/PPC specifications, CO₂ injection) is rigorous, adhering to standards The manuscript is written in clear, professional English, with a structured introduction that contextualizes CO₂ emissions in cement production and the role of CO₂ sequestration in concrete. The literature review is comprehensive, citing relevant works to highlight the knowledge gap. However, several issues require attention and thus, significant revisions are needed:
1.Validity of Synthetic Data for Concrete Mix Design
How well do the synthetic data generated by the algorithm used represent the physical and chemical behaviors of CO₂-incorporated concrete? The manuscript (e.g., Table 3, line 350) claims high data validity (100%) but lacks evidence linking synthetic data to physical concrete properties. Without experimental validation (e.g., testing synthetic mix designs in a lab), the synthetic data may not capture critical nonlinear interactions, such as the effect of excessive CO₂ on hydration (line 189). This could mislead mix design optimization, as noted in literature, where excessive carbonation reduced strength.
2. Filtering Process Ambiguity
What specific criteria were used to filter "noisy synthetic data" (line 347)? The manuscript mentions removing outliers but does not define thresholds or methods, which is critical for reproducibility.
3. Impact on Structural Design
How do errors in synthetic data affect the reliability of ML predictions for structural applications? Also, the manuscript does not discuss the implications of synthetic data errors on design outcomes.
4. Generalizability to Diverse Concrete Types
Can the synthetic data generalize to other concrete types (e.g., high-performance concrete, self-compacting concrete) or environmental conditions (e.g., high humidity, temperature variations)? The manuscript does not address whether CTGAN/TVAE can generate synthetic data for diverse concrete compositions or curing conditions, which are common in civil engineering projects. The dataset is limited to OPC and PPC with specific w/c ratios (Table 2). This limits the framework’s practical utility.
5. CO₂ Injection Details
The CO₂ injection process (lines 169–173) lacks specifics on flow rate, mixing chamber design, and safety protocols. Civil engineers need practical details for implementation. I suggest adding a subsection in "Mix Design and Methodology" detailing equipment setup, flow rate (e.g., L/min), and safety measures, referencing ASTM C1768.
6. Hyperparameters
The ML pipeline omits key hyperparameters (except max depth, Figure 7d). I suggest including a table listing hyperparameters (e.g., learning rate, estimators) for all models (LGBM, GBR, etc.).
7. Interpretations
Figures 9–11 (SHAP plots) are insightful but lack practical translation for mix design. Add a subsection in "Explainable Artificial Intelligence" titled "Interpreting SHAP for Concrete Mix Design," explaining how SHAP values guide CO₂ dosage or curing time optimization.
Conclusion
The manuscript offers a novel framework for predicting CO₂-incorporated concrete strength, but its reliance on synthetic data without experimental validation and limited practical focus for civil engineers necessitate major revisions. By addressing data augmentation concerns, enhancing practical relevance, and improving transparency, the paper can significantly impact sustainable concrete design and construction.

Annotated reviews are not available for download in order to protect the identity of reviewers who chose to remain anonymous.

·

Basic reporting

This paper proposes a comprehensive analysis of how to estimate the compressive strength prediction of CO2-incorporated concrete using state-of-the-art methods to augment the amount of data, and also introduces an explainable regression modeling approach. This topic is relevant for both machine learning applications and civil engineering.
The manuscript has a good structure, and the readability flows well throughout the entire manuscript. However, the manuscript must be enhanced before it can be published. In the Basic aspects, it is necessary to address the following issues:
1.- In the abstract section, in the sentence "R2 value of 0.9872, MAE of 1.1847," it is necessary to specify to which dataset it belongs, train or testing?
2.- There is a spelling issue in Figure 6, "Coorelation matrix".
3.- In Algorithm 1, I consider that there is a mistake in the "fflill(D)".
4.- In the next line, the correct term is MinMaxScaler with a single "l".
5.- The manuscript must be homogeneous; take special care of the CO2 word, in some parts of the text, the authors use subscript and in other cases use plain format.
6.- I consider that they are relevant manuscripts that address the concrete strength prediction using ML or DL in the state-of-the-art, which should be included to improve the work.

Experimental design

I recommend to the authors the consistency with the normative, for instance, in some cases they use IS, and in other cases they compare the equivalence with the ASTM. They should use all the equivalences with the ASTM.
The authors should explain why they did not use a validation dataset. It is a good practice in ML problems to use training, testing, and validation datasets. I consider that the augmented dataset gives the opportunity to provide this additional dataset.

Validity of the findings

To replicate the experiment carried out in this research, the authors should provide clearer instructions on how to use CTGAN, TVAE, and the SDV library. In its current form, the authors merely mention the use of these tools and frameworks; however, it is unclear how they were implemented during the research.
It is a good practice to share a repository with the code used in the manuscript to provide certainty and ensure reproducibility of the phenomena.
The conclusions need to be expanded to effectively highlight the key findings from the research.

Additional comments

In general terms, the manuscript addresses an interesting topic in the field of computer science. However, in its current form, it needs to be improved to achieve better explainability.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.