All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Dear Authors,
addressed all of the reviewers' comments and the manuscript is ready for publication.
M.P.
[# PeerJ Staff Note - this decision was reviewed and approved by Claudio Ardagna, a 'PeerJ Computer Science' Section Editor covering this Section #]
Dear Authors,
Please clarify the the statement identified by Reviewer #3
M.P.
None
None
None
(1) "If the output for the source test case input differs from the output for the follow-up test case input, the MR::is said:: to:: be:::::::::: dissatisfied"
This statement is not a general description about MR dissatification. We check the source and follow-up outputs againts the MR: if they violate the specified relationships, then the MR is dissatisfied.
Equality is only of the MR relationship.
Maybe in this study, MRs employ equility output relationships, but this is only one of the cases.
I suggest the author to clarify the retionale of MT clearly.
Dear Authors,
Please carefully check the issues raised by the reviewers, especially Reviewer 2.
Moreover, the main aspects of novelty of your work as well as your contribution with respect the state of the art should be clarified since, in my opinion, it is not totally clear.
Best regards,
M.P.
In this paper, the authors reported their recent work on applying the so-called statistical metamorphic testing technique into the testing of deep learning based pneumonia detection systems. Seven MRs were defined, and mutation analysis was utilized to evaluate the performance of the technique. The experimental results demonstrated the applicability of the metamorphic testing technique in the specific field.
The experiments were mainly conducted in the field of healthcare. They were designed in a scientific way.
The experimental results support the claim made by the authors.
While the paper is generally well-written, some sections, especially around statistical methods and technical details, could benefit from clearer language to enhance accessibility for a broader audience.
The paper does not provide extensive details about the experimental setup, which may hinder replication of the study by other researchers. More specific information on the training and testing environments, model configurations, and hyperparameters would be beneficial.
The paper uses statistical methods to verify metamorphic relations, but the choice of statistical tests and the justification for their use could be more thoroughly explained. Additionally, the discussion of statistical power and potential Type II errors could be expanded.
no comments.
no comments.
no comments.
This paper proposes a statistical metamorphic testing approach, for testing CNN based models that exhibit non-deterministic characteristics. The motivation behind this work is explained and justified. The approach and the supporting experiments are clearly clarified.
Strength: Good writing
Novel statistical metamorphic testing approach
Clear discussion of related work
Weakness: Insufficient interpretation of the experimental results
(1) The introduction of MT and MRs are not accurate enough. I suggest that you improve Section “Metamorphic Testing” on page 4.
Line 189-190, the description about the MR and source/follow-up test cases need to be improved. It is not right to state “each MR is comprised of a source and a follow-up test cases”
Intrinsically, MR specify the relationships among relevant inputs and their outputs.
Also, Figure 2 needs to be improved in order to avoid misunderstanding about MT and MRs.
(2) I think the combining MT with statistical test is one of the key novelties of this work.
Although each test is described, I still suggest you to further reason about them. Why different statistical tests are used for different comparisons?
For example, at line 460-461, “This type of data can now be statistically compared using the Chi-square parametric test of homogeneity”. Why Chi-square test can be applied here?
Secondly, it is unclear whether each test is performed on the results for each test instance (i.e., Xi-test declared at line 483) or is performend on the results of all test instances?
If it is the former case, how to make a decision based on different restuls from different instances (i.e., some are significant different, some are not). If it is the latter case, it needs to be explictely clarified.
(3) I suggest to justify or explain some settings of the experiments.
Such as: n=30 (line 564).
Are 60 samples sufficient for an accurate and reliable statistical analysis?
(4) More details about the results are needed.
From Table2 to Table 5, it is observed that the same MR with different statistical tests yield varying testing effectiveness. It seems that t-test are more effective than Chi-square Test and maximum voting. Are there any indications for deciding which statistical test should be used?
(5) Others.
Some images are blur, for example, figure 1 and figure 6
Line 783, “testing” is duplicate.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.