All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Dear Author,
Your paper has been revised. It has been accepted for publication in PeerJ Computer Science. Thank you for your fine contribution.
[# PeerJ Staff Note - this decision was reviewed and approved by Sedat Akleylek, a PeerJ Section Editor covering this Section #]
This is a revision. I am satisfied with this version and recommend acceptance.
-
-
Reviewer 1 identified issues that remain to be addressed.
This paper proposes a new membership inference attack that differs from existing LIRA attack with shadow model training in the following ways. First, shadow models are trained by distilling the target model on a randomly selected subset of auxiliary data. Second, instead of using hypothesis testing, one extracts the following features: target model loss on target model instance, mean and standard deviation of shadow model losses of the instance, and trains a Neural Network classifier.
This paper is a revision. I think the revision has improved the paper in that the theory in the original submission that was disconnected with the proposed method was removed.
Suggested improvements:
While I agree that using distillation for training shadow models can help, I am still puzzled by the benefit of using NN, since the features are several numbers. It is hard to envision complex decision boundaries for such a problem. Maybe I missed it, but I did not find the NN architecture used for the inference model. It should be prominently described. Also, instead of NN, I would like to see results using logistic regression or decision tree?
Figure 1 shows that LiRA membership scores are more correlated with self-influence scores than the scores produced by the proposed MI attack. However, to show that this improves TPR@low FPR, the paper should also show the distribution of membership scores for non-members. It would help to see two histograms (one for hypothesis testing score and one for NN output), and in each bin how many members and how many non-members.
Typo:
Line 327: by Liu et al.Liu et al. (2022)
See above.
See above.
no comment
no comment
no comment
no comment
Please pay particular attention to the concerns of reviewer 1 regarding the clarity of your rationale for this approach.
**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
This paper is about Membership Inference Attacks. It introduces DSMIM-MIA (Distilled Shadow Model and Inference Model). The key differences when compared to the literature are:
(1) When training shadow out models, instead of using labels of instances, it uses outputs of the target model in the fashion of model distillation.
(2) For a target instance and a target model, when given the model's loss on the target, and the mean and variance of shadow models' outputs on the instance, the attack trains a Neural Network to decide membership.
C1: I have trouble making sense of the theory (Theorem 1) or connecting it with the DSMIM-MIA attack. First, the theorem statement is that I>Influence(x,M) + K Pr(A=B), and I could not find the definition of Influence(x,M). Second, the model outputs should be viewed as real numbers. However, the proof uses summation instead of integration. Third, even if the Theorem is true, it is unclear to me how this motivates using distillation for shadow model training.
C2. I find the proposed DSMIM-MIA counterintuitive. First, Shadow models are aiming to capture model training behavior when some instances are not used in training. Using distillation would result in models that do not simulate model training behavior. Second, in the end, determining membership of an instance, one gets three numbers: the target model's loss of the instance, the mean, and the variance of shadow model losses of the instance. It is unclear why an NN would perform better than a statistical measure when the data feature consists of 3 numbers that have clear statistical meanings.
C3. The writing of the paper needs a lot of improvement.
"sort-of-the-art" should be "state-of-the-art"
"difficult calibration" should be "difficulty calibration"
"lossy trajectory" should be "loss trajectory"
"and it is not clear which samples are more likely to reason about true membership under difficulty calibration"; samples should not be the subject for "reasoning out."
"The output of x on M_{out}" should be "The output of M_{out} on x"
-
-
Remarks:
1. Authors should provide a brief explanation of the term OUT model in advance, thus avoiding ambiguity, especially before the first use of the term.
2. Better clarify the concept of "smallest form of leakage", for example by replacing it with the phrase "fundamental threat".
3.The authors state that hypothesis testing is ineffective for low-self-influence samples (lines 294–296), but the explanation of why neural networks should be more effective is rather general and brief. In particular, they also fail to consider the problems that may be associated with using this type of approach.
4. The constant K in the proof of Theorem 1. It should be precisely explained why
Regarding the assumptions of the theoretical proof: To ensure the robustness and clear applicability of the theoretical results, we ask that you provide a more detailed explanation and justification of the assumptions made in the proof. In particular, explaining the assumption of equality of models in the absence of any sample is essential to validating the theoretical foundations of the framework.
Regarding the experimental results: The experimental results of the manuscript would be greatly strengthened by including measures of statistical significance. To better verify the observed effects and demonstrate their robustness, we ask that you include metrics such as standard deviations, confidence intervals, or p-values alongside the main experimental results. This will provide crucial context regarding the variability and robustness of the reported results.
no comment
no comment
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.