All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
In the opinions of original reviewers and mine, this revised paper is able to accept.
[# PeerJ Staff Note - this decision was reviewed and approved by Sedat Akleylek, a PeerJ Section Editor covering this Section #]
no comment
no comment
no comment
I like the totality of the works.
It has a good level of language.
Aim of the paper is defined well and rest of the paper focus on solving this problem.
The results are satisfactory and acceptable for whoving the novelty of the proposal
The authors did the related corrections depending on the previous round of reviews.
Therefore It can be accepted as is
Please address all the concerns proposed by the two reviewers carefully, and a revised paper will be rechecked by them again.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
1. While citing references within the manuscript, it would be clearer if the whole citation is inside the bracket rather than just the year. For example, line 31 has “Tayyab et al. (2022) and Gopinath and Sethuraman (2023)” which could instead be written as “(Tayyab et al., 2022 and Gopinath and Sethuraman,. 2023)” for reader’s clarity
2. Line 108 states “as seen in Figure 2”, which is a typo. It should be replaced by “as seen in Figure 1”
3. Line 108 “For executables,”, which is a typo. It should be replaced by “for executables,”
4. Line 109 states “Windows operating system, ”, which is a typo. The comma should be replaced with a period as “Windows operating system. ”
5. Line 221 states “With the use of AI evolution”. I think the better phrasing would be “With the rise of AI evolution”
6. The definition of “Zero-Shot Learning” is being repeated multiple times throughout the paper. Consider being less verbose.
7. Line 258 states "Yakura et.al Yakura et al.”. The citation is repeated
8. Line 270 states "Ganesan et.al Ganesan et al.”. The citation is repeated
9. Line 278 states "Ravi et.al Ravi and Alazab”. The citation is repeated
10. Figure 6 contains a typo in enumerating the last image. The enumeration alphabet "o" is repeated twice and it should be “(p) Azero.A” instead of “(o) Azero.A”
11. Line 251 states “In their publication,Rahman et al. “. There needs to be a blank space after the comma such as, “In their publication, Rahman et al. “.
12. Line 345 states “With malware transformed into images,the challenge”. There needs to be a blank space after the comma such as, “With malware transformed into images, the challenge”.
13. Line 353 states “Within our proposed methodology,” with an additional tab space before the beginning of the sentence, which needs to be removed.
14. Line 369 states “In the context of our model” with an additional blank space before the beginning of the sentence, which needs to be removed.
15. Line 370 states “(batch size,query features)”. There needs to be a blank space after the comma such as, “(batch size, query features)”.
16. Line 371 states “(batch size,num support images,support features)”. There needs to be a blank space after the comma such as, “(batch size, num support images, support features)”.
17. Line 405 has “Where” with a capital “W”, which should instead begin with a small “w” such as, “where”
18. Line 431 has “malimg” with a small “m”, which should instead be replaced with a capital “M” such as “about the Malimg dataset”
19. Line 437 has “malevis” with a small “m” and “v”, which should instead be replaced with a capital “M” and capital “V” respectively, such as “MaleVis”
20. Figure 13 title has “malevis validation dataset” which should instead read as “MaleVis Validation Dataset”
21. Figure 12 has no labels to tell the exact number in each bar of the frequency histogram. Unlike how it has been labelled in Figure 13
22. Table 1 and Figure 12 represent the same information, why do we need 2 of those? One should suffice
23. Line 493 does not list all the baseline papers mentioned in Table 3. Specifically, “Hui et al. (2019)” and “Bishay et al. (2019)” papers have not been listed
24. Table 4 title, Lines 502, 517, 521, 537 and Figure 16 mention “Malevis”, which should instead read as “MaleVis”
25. Line 509 has a statement “Verma et al. Verma et al. (2020)” with “Verma et al.” being mentioned twice.
26. Similarly, line 509-510 has a statement “D. Huynh et al. Huynh and Elhamifar (2020)” with the reference being mentioned twice.
27. Figures and tables should be placed closer to their descriptions within the manuscript for better readability. For example, Table 5 is on page 19 but it’s described on page 21
28. Line 538 states “Statistical DNN”, which should have a comma between the two words to indicate the two different models as per Table 5. It should be replaced with “Statistical, DNN”
1. Line 220 refers to “APTs”. What are APTs?
2. Line 403 states that “Hv is the same as H”. Then why use Hv in the equation above instead of H?
3. Figure 12 is not referenced in the manuscript
4. Line 510 has a statement “with high precision but lower recall”, but Table 4 shows this being true for “Verma et al.” but not for “Huynh and Elhamifar“. The statement is contradictory and needs to be corrected
1. The “RESULTS AND ANALYSIS” section starts with explaining the methodology and experimental setup for the Malimg dataset in line 482, and never concludes on its results. Instead, it jumps to the results of the MaleVis dataset in line 502. Please explain the results on the Malimg dataset before talking about the other dataset.
2. There is no explanation of the experimental setup for the MaleVis dataset. What classes were hidden during ZSL evaluation? What classes are used in training, validation and testing phases?
3. In Table 4, it is unclear what experimental setup was used for the baseline comparison. Are the results reported here using ZSL setup? Some of the baseline methods like Hui et al. (2019) are not ZSL methods. How are we evaluating the performance of SMART when it is being compared with baseline models that are a mixture of ZSL and non ZSL methods?
4. In Table 5, are the results presented with a ZSL setup or non ZSL setup?
5. The source code provided as part of supplementary materials does not indicate a ZSL setup
6. The dataset provided as part of the supplementary materials only contains the MaleVis dataset and the model is being evaluated only against this dataset. There is no indication of Malimg datatset being provided or used.
7. Line 542 states “such as Neoreklami, exhibit lower accuracy scores” which contradicts the data from Table 5, where Neoreklami class achieves an accuracy of 1.00
8. Figure 17 and 18 have no Y axis labels and the scale looks manipulated. Why are the precision and accuracy scales ranging between [0,1.2]? Why are some of the precision and accuracy values exceeding the 1 mark?
9. Also, why are the scale values for Figures 17 and 18 ranging to 1.2 on the top X axis and 1 on the bottom X axis? Which scale should the results be interpreted with?
10. There is no emphasis on the results for the Malimg dataset. Figure 14, presents a confusion matrix but there is no mention of Accuracy, Recall, Precision or F1-score.
11. Can you perform an Ablation study to see how removal or modification of the system impacts overall performance?
The paper is titled as "Semantic-aware framework for zero-shot malware classification via attention-based relation network" and it is aimed to propose a novel method for zero-shot malware detection, specifically for classifying novel and infrequent malware instances without prior training samples. The topic is interesting and it is one of the hot research topics in cybersecurity and malware detection.
Therefore the paper fits in the scope of PeerJ Computer Science journal.
The manuscript is exceptionally well-written, employing precise and professional English.
The authors have done a good job of situating their research within the current field. Therefore the literature review is comprehensive.
I do not like the use of Figures and Tables. Some of them are not understandable. The authors whould look eacjc of them carefully.
Although This manuscript presents compelling original research that aligns perfectly with the Aims and Scope of the PeerJ Computer Science journal. It is a valuable contribution to the field.
There are some missing concepts and vague points in the paper. They mus be clearly restated.
The paper reached some valuable results. However The authors should check their correctness again. (Explaiained in the following parts)
I have some objections before accepting the paper. They must be clarified.
MAJOR 1) I want to learn the reason why the confusion matrix has different values with the original Malimg Dataset . (As depicted in Figure 14. Confusion Matrix for Malimg Dataset)
Mainly, the dataset contains following records in testing Zero Shot Learning
Worm
* Allaple.L: 1591
* Allaple.A: 2949
* VB.AT: 408
* Yuner.A: 800
Total = 5748
Worm:AutoIT
* Autorun.K: 106
Backdoor
* Agent.FYI: 116
* Rbot!gen: 158
Total = 274
THerefore ın the confusion matrix I want to see these data. However not? Please explain its reason.
MAJOR 2)
Zero-Shot Learning is defined in the paper as "a learning approach in which training examples do not occur for all classes, resulting in learning modeling without label availability."
THe following part is very critical for understanding the study.
"There are 25 malware families, which we grouped into 8 malware classes. Our experimental setup involves conducting ZSL, where all the classes in the testing set are unseen during training. Specifically, we chose 3 classes (Dialer, PWS, and Rogue) for the training set, 2 classes (TDownloader and Trojan) for the validation set, and 3 classes (Worm:AutoIT, Worm, and Backdoor) for the testing set."
The authors emphasized that "3 classes (Worm:AutoIT, Worm, and Backdoor) for the testing set."
If there are only 3 classes in testing set,
How there can be 27 classes in "Table 5. Performance metrics for all classes of MaleVis dataset"?
MAJOR 3)
"Figure 16. Performance for different classes for Malevis Dataset" and
"Figure 17. Comparison of Accuracy with Existing Models" and"Figure 18. Comparison of Precision with Existing Models" are really har to see and understand. THey should be depicted as Tables. (17 and 18 can be combined together)
MAJOR 4)
I want to see the time effciency of the proposed system (with comparison to the existing models)
What is the training time and testing time of the models?
MAJOR 5)
I want ot learn the Runtime efficiency the proposed model.
Mainly how long does it take to convert and malware to an image?
and How long does it take to detect a single image?
MAJOR 6)
Please check the values in "Table 4. Performance Comparisons measures for zero-shot problem using Malevis dataset"
For examples
"Catch them alive: A malware detection approach through memory forensics, manifold learning and computer vision Ahmet Selman Bozkir a , ∗, Ersan Tahillioglu b , Murat Aydos a , Ilker Kara" reached the accuracy value as 96.39%
However the authors put to this table as "73%".
How this error can be?
MINOR 1)
"Figure 12. Class Frequencies Histogram of Malimg dataset" has no meaning same data exist in "Table 1. Sample Distribution in Malimg Dataset". THerefore the first one (figure) should be deleted.
Same thing is also valid for "Table 2. Sample Distribution in MaleVis Dataset" and "Figure 13. Class Frequencies Histogram of MaleVis Validation Dataset". Therefore, figure should be deleted from heer.
MINOR 2)
There is no equation number for
Accuracy in Line 461.
Precision işn Line 466
and Recall in Line 470
F1-score is not written as a Equation in Line 473
MINOR 3)
"Table 4. Performance Comparisons measures for zero-shot problem using Malevis dataset" has no same precison after ".". It should be preferred two precision point after "."
MINOR 4)
"Table 4. Performance Comparisons measures for zero-shot problem using Malevis dataset" -- why there is a different notation for performance metrics here (some one has %, some others are not)
MINOR 5)
In "Table 5. Performance metrics for all classes of MaleVis dataset", some values have 3 precision point after "."
MINOR 6)
"The findings, as illustrated in Figure 14, revealed a notably low false positive rate across all families. Specifically, for Allaple.L, the accuracy reached its peak." -- ??
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.