All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The paper has addressed all questions.
[# PeerJ Staff Note - this decision was reviewed and approved by Sedat Akleylek, a 'PeerJ Computer Science' Section Editor covering this Section #]
no comment
no comment
no comment
In response to the revision suggestions previously made, this paper has been revised and there are no major problems now.
no comment
no comment
no comment
I'd suggest that the authors report the total time taken to process the entire test set, rather than the time per step. The per-step measurement can be significantly affected by OS context switches, interruptions, CPU-GPU memory transfers, and other factors, especially at the scale of a few milliseconds. Currently the reported prediction time is not particularly meaningful.
Please take the comments of the reviewers seriously and make necessary revisions. If the comments of the reviewers cannot be met, the manuscript will be rejected.
no comment
The article mentioned that there are obvious differences in the preprocessing time of the three methods of Malconv, SMO, and ProcGCN, but no accurate experimental analysis was conducted. Please conduct reasonable experimental settings and verify them.
There are only a few experimental results and the conclusions of this article cannot be verified. Please add necessary experiments.
no comment
no comment
no comment
I'd like to thank the authors for submitting the revised manuscript. The authors have resolved all my concerns. While it would be nice to have a generalization (on a completely different malware dataset, e.g. VirusTotal, that has not been trained on) and a real-world performance evaluation (false positive rate on large-scale benign programs), I understand that some of these evaluations might not be feasible within the scope of this study, and it doesn't significantly detract from the quality of the work.
According to the reviewers' suggestions and my opinion, I suggest the paper should make major revisions.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** PeerJ staff have identified that the English language needs to be improved. When you prepare your next revision, please either (i) have a colleague who is proficient in English and familiar with the subject matter review your manuscript, or (ii) contact a professional editing service to review your manuscript. PeerJ can provide language editing services - you can contact us at [email protected] for pricing (be sure to provide your manuscript number and title). – PeerJ Staff
no comment
no comment
no comment
In this article "4.3 Analysis and Discussion", there are the following two issues that need to be added to the necessary analysis.
Although the performance advantages of the ProcGCN model in experiments were mentioned, the generalization performance of the model was not discussed in detail. Achieving high performance on a small-scale data set does not necessarily mean that the model will perform well on a larger, more complex data set. Generalization performance is one of the key factors in evaluating the effectiveness of a model.
Although it is mentioned that the accuracy of the model improves with the increase of features, the impact of hardware resource limitations on model performance and resource consumption is not discussed in detail. This is an important factor, especially in practical applications where computing resources and memory resources need to be considered.
no comment
no comment
The authors mention that "many internal functions also carry meaningful names", but internal functions typically have no names unless they are also exported functions (which are quite rare in EXEs) or if the corresponding debug information is somehow loaded by IDA. I would recommend that the authors provide statistics on the EXE-to-DLL ratio and the percentage of functions having no meaningful names for both benign and malicious samples.
I'd like to thank the authors for revising the manuscript and answering many of my questions.
1. For my first point, my suggestion was to feed raw binary files instead of memory dumps to the MalConv model. If the MalConv model does not perform well on obfuscated or packed binaries but ProcGCN is resilient to obfuscation, it proves the superiority of using memory images (to some extent).
2. Since ProcGCN relies on memory images, I do not think this is a pure static approach. Hence, I believe comparing it with some dynamic approaches such as DMalNet is reasonable. DMalNet also runs malware in the Cuckoo sandbox.
That being said, the above suggestions are based on the impression that ProcGCN is a generic malware detection approach. If the authors try to propose an approach designed specifically for memory images, I'm fine with the current experiment setup.
3. In GCN, a graph with 1,000 nodes is a relatively small graph and should not cause any memory stress. I guess the authors are keeping the edges in a dense adjacency matrix. In that case, using a sparse matrix could potentially allow for handling orders of magnitude more nodes.
4. Line 388, the long*er* feature vector of each node, ...
The authors need to improve the theoretical analysis in order to prove the feasibility of the proposed method. This underpins the research aim, so it is critical that this is addressed in the revision.
[# PeerJ Staff Note: Please ensure that all review comments are addressed in a rebuttal letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. It is a common mistake to address reviewer questions in the rebuttal letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the rebuttal letter. Directions on how to prepare a rebuttal letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]
no comment
Please add the experimental design in section "4.2 Experimental Environment and Procedures" and explain why the data split is 9:1 instead of the traditional 8:2.
In order to verify the validity of the experimental results, please compare the experimental results with the latest research.
This article is less like a scientific study and more like an experiment.
no comment
The authors mention in the abstract and introduction that byte features are susceptible to mutation and obfuscation, and function call features are more robust. However, there is no experiment to prove this. The evaluation result shows that byte features perform equally well. I'd suggest that the authors evaluate some packed or obfuscated samples to demonstrate the effectiveness of process dumps.
On the other hand, ProcGCN is only evaluated against MalConv and Bozkir's method. Section 2 of the paper mentions other open-source works, such as DMalNet, that utilize similar approaches. I'd suggest that the authors evaluate more related work to show the robustness and effectiveness of ProcGCN.
The evaluation of the proposed approach relies solely on the dataset collected by Fang et al. Moreover, the authors randomly selected 2,000 samples from the dataset and retained only the ones whose call graph contains between 20 to 1,000 nodes.
First, this dataset is insufficient for a valid malware detection evaluation. With fewer than 900 malicious samples in the dataset, only about 90 malicious samples are used for evaluation. Even with a reported accuracy of 98% to 99%, the result is not convincing, given the size of the dataset. VirusTotal provides malware samples for academic research for free. There are many other services like VirusSign that provide malware feeds for free, so finding executable malware shouldn't be an issue.
Second, the authors need to justify why they randomly selected 2,000 samples and filtered out samples with more than 1,000 nodes. The original dataset is already small, with less than 8,000 samples, and picking out only one-third of the dataset doesn't seem reasonable.
Some minor issues:
1) I'm curious about the average percentage of external functions in a call graph. Since internal functions do not carry semantic information and each GCN layer can only propagate information to its nearest neighbor, the model may not perform very well if the external functions are not close enough, for example, more than four hops away.
2) The authors mention at line 293 that EXEs and DLLs are extracted from the dataset. How do the authors run DLLs in the Cuckoo sandbox? Is there a specific loader submitted to the sandbox?
3) At line 319, there is no need to convert the GDL format to DOT because IDA Pro is able to dump graphs in the DOT format directly.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.