Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on March 7th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on May 8th, 2025.
  • The first revision was submitted on June 26th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • A further revision was submitted on July 28th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on September 15th, 2025.

Version 0.3 (accepted)

· Sep 15, 2025 · Academic Editor

Accept

Dear Author,

Your paper has been revised. It has been accepted for publication in PeerJ Computer Science. Thank you for your fine contribution.

[# PeerJ Staff Note - this decision was reviewed and approved by Sedat Akleylek, a PeerJ Section Editor covering this Section #]

Reviewer 1 ·

Basic reporting

This is a revision. I am satisfied with this version and recommend acceptance.

Experimental design

-

Validity of the findings

-

Version 0.2

· Jul 21, 2025 · Academic Editor

Major Revisions

Reviewer 1 identified issues that remain to be addressed.

Reviewer 1 ·

Basic reporting

This paper proposes a new membership inference attack that differs from existing LIRA attack with shadow model training in the following ways. First, shadow models are trained by distilling the target model on a randomly selected subset of auxiliary data. Second, instead of using hypothesis testing, one extracts the following features: target model loss on target model instance, mean and standard deviation of shadow model losses of the instance, and trains a Neural Network classifier.

This paper is a revision. I think the revision has improved the paper in that the theory in the original submission that was disconnected with the proposed method was removed.

Suggested improvements:

While I agree that using distillation for training shadow models can help, I am still puzzled by the benefit of using NN, since the features are several numbers. It is hard to envision complex decision boundaries for such a problem. Maybe I missed it, but I did not find the NN architecture used for the inference model. It should be prominently described. Also, instead of NN, I would like to see results using logistic regression or decision tree?

Figure 1 shows that LiRA membership scores are more correlated with self-influence scores than the scores produced by the proposed MI attack. However, to show that this improves TPR@low FPR, the paper should also show the distribution of membership scores for non-members. It would help to see two histograms (one for hypothesis testing score and one for NN output), and in each bin how many members and how many non-members.

Typo:

Line 327: by Liu et al.Liu et al. (2022)

Experimental design

See above.

Validity of the findings

See above.

Reviewer 2 ·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

no comment

Version 0.1 (original submission)

· May 8, 2025 · Academic Editor

Major Revisions

Please pay particular attention to the concerns of reviewer 1 regarding the clarity of your rationale for this approach.

**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff

Reviewer 1 ·

Basic reporting

This paper is about Membership Inference Attacks. It introduces DSMIM-MIA (Distilled Shadow Model and Inference Model). The key differences when compared to the literature are:
(1) When training shadow out models, instead of using labels of instances, it uses outputs of the target model in the fashion of model distillation.
(2) For a target instance and a target model, when given the model's loss on the target, and the mean and variance of shadow models' outputs on the instance, the attack trains a Neural Network to decide membership.

C1: I have trouble making sense of the theory (Theorem 1) or connecting it with the DSMIM-MIA attack. First, the theorem statement is that I>Influence(x,M) + K Pr(A=B), and I could not find the definition of Influence(x,M). Second, the model outputs should be viewed as real numbers. However, the proof uses summation instead of integration. Third, even if the Theorem is true, it is unclear to me how this motivates using distillation for shadow model training.

C2. I find the proposed DSMIM-MIA counterintuitive. First, Shadow models are aiming to capture model training behavior when some instances are not used in training. Using distillation would result in models that do not simulate model training behavior. Second, in the end, determining membership of an instance, one gets three numbers: the target model's loss of the instance, the mean, and the variance of shadow model losses of the instance. It is unclear why an NN would perform better than a statistical measure when the data feature consists of 3 numbers that have clear statistical meanings.

C3. The writing of the paper needs a lot of improvement.

"sort-of-the-art" should be "state-of-the-art"
"difficult calibration" should be "difficulty calibration"
"lossy trajectory" should be "loss trajectory"
"and it is not clear which samples are more likely to reason about true membership under difficulty calibration"; samples should not be the subject for "reasoning out."
"The output of x on M_{out}" should be "The output of M_{out} on x"

Experimental design

-

Validity of the findings

-

Reviewer 2 ·

Basic reporting

Remarks:
1. Authors should provide a brief explanation of the term OUT model in advance, thus avoiding ambiguity, especially before the first use of the term.
2. Better clarify the concept of "smallest form of leakage", for example by replacing it with the phrase "fundamental threat".
3.The authors state that hypothesis testing is ineffective for low-self-influence samples (lines 294–296), but the explanation of why neural networks should be more effective is rather general and brief. In particular, they also fail to consider the problems that may be associated with using this type of approach.
4. The constant K in the proof of Theorem 1. It should be precisely explained why

Experimental design

Regarding the assumptions of the theoretical proof: To ensure the robustness and clear applicability of the theoretical results, we ask that you provide a more detailed explanation and justification of the assumptions made in the proof. In particular, explaining the assumption of equality of models in the absence of any sample is essential to validating the theoretical foundations of the framework.

Regarding the experimental results: The experimental results of the manuscript would be greatly strengthened by including measures of statistical significance. To better verify the observed effects and demonstrate their robustness, we ask that you include metrics such as standard deviations, confidence intervals, or p-values ​​alongside the main experimental results. This will provide crucial context regarding the variability and robustness of the reported results.

Validity of the findings

no comment

Additional comments

no comment

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.