Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on June 13th, 2023 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on July 11th, 2023.
  • The first revision was submitted on September 5th, 2023 and was reviewed by 2 reviewers and the Academic Editor.
  • A further revision was submitted on April 26th, 2024 and was reviewed by 2 reviewers and the Academic Editor.
  • A further revision was submitted on June 12th, 2024 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on June 21st, 2024.

Version 0.4 (accepted)

· Jun 21, 2024 · Academic Editor

Accept

The paper has addressed all questions.

[# PeerJ Staff Note - this decision was reviewed and approved by Sedat Akleylek, a 'PeerJ Computer Science' Section Editor covering this Section #]

·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

In response to the revision suggestions previously made, this paper has been revised and there are no major problems now.

Cite this review as

Reviewer 2 ·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

I'd suggest that the authors report the total time taken to process the entire test set, rather than the time per step. The per-step measurement can be significantly affected by OS context switches, interruptions, CPU-GPU memory transfers, and other factors, especially at the scale of a few milliseconds. Currently the reported prediction time is not particularly meaningful.

Cite this review as

Version 0.3

· May 6, 2024 · Academic Editor

Major Revisions

Please take the comments of the reviewers seriously and make necessary revisions. If the comments of the reviewers cannot be met, the manuscript will be rejected.

·

Basic reporting

no comment

Experimental design

The article mentioned that there are obvious differences in the preprocessing time of the three methods of Malconv, SMO, and ProcGCN, but no accurate experimental analysis was conducted. Please conduct reasonable experimental settings and verify them.

Validity of the findings

There are only a few experimental results and the conclusions of this article cannot be verified. Please add necessary experiments.

Cite this review as

Reviewer 2 ·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

I'd like to thank the authors for submitting the revised manuscript. The authors have resolved all my concerns. While it would be nice to have a generalization (on a completely different malware dataset, e.g. VirusTotal, that has not been trained on) and a real-world performance evaluation (false positive rate on large-scale benign programs), I understand that some of these evaluations might not be feasible within the scope of this study, and it doesn't significantly detract from the quality of the work.

Cite this review as

Version 0.2

· Sep 18, 2023 · Academic Editor

Major Revisions

According to the reviewers' suggestions and my opinion, I suggest the paper should make major revisions.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** PeerJ staff have identified that the English language needs to be improved. When you prepare your next revision, please either (i) have a colleague who is proficient in English and familiar with the subject matter review your manuscript, or (ii) contact a professional editing service to review your manuscript. PeerJ can provide language editing services - you can contact us at [email protected] for pricing (be sure to provide your manuscript number and title). – PeerJ Staff

·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

In this article "4.3 Analysis and Discussion", there are the following two issues that need to be added to the necessary analysis.

Although the performance advantages of the ProcGCN model in experiments were mentioned, the generalization performance of the model was not discussed in detail. Achieving high performance on a small-scale data set does not necessarily mean that the model will perform well on a larger, more complex data set. Generalization performance is one of the key factors in evaluating the effectiveness of a model.

Although it is mentioned that the accuracy of the model improves with the increase of features, the impact of hardware resource limitations on model performance and resource consumption is not discussed in detail. This is an important factor, especially in practical applications where computing resources and memory resources need to be considered.

Cite this review as

Reviewer 2 ·

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

The authors mention that "many internal functions also carry meaningful names", but internal functions typically have no names unless they are also exported functions (which are quite rare in EXEs) or if the corresponding debug information is somehow loaded by IDA. I would recommend that the authors provide statistics on the EXE-to-DLL ratio and the percentage of functions having no meaningful names for both benign and malicious samples.

Additional comments

I'd like to thank the authors for revising the manuscript and answering many of my questions.

1. For my first point, my suggestion was to feed raw binary files instead of memory dumps to the MalConv model. If the MalConv model does not perform well on obfuscated or packed binaries but ProcGCN is resilient to obfuscation, it proves the superiority of using memory images (to some extent).

2. Since ProcGCN relies on memory images, I do not think this is a pure static approach. Hence, I believe comparing it with some dynamic approaches such as DMalNet is reasonable. DMalNet also runs malware in the Cuckoo sandbox.

That being said, the above suggestions are based on the impression that ProcGCN is a generic malware detection approach. If the authors try to propose an approach designed specifically for memory images, I'm fine with the current experiment setup.

3. In GCN, a graph with 1,000 nodes is a relatively small graph and should not cause any memory stress. I guess the authors are keeping the edges in a dense adjacency matrix. In that case, using a sparse matrix could potentially allow for handling orders of magnitude more nodes.

4. Line 388, the long*er* feature vector of each node, ...

Cite this review as

Version 0.1 (original submission)

· Jul 11, 2023 · Academic Editor

Major Revisions

The authors need to improve the theoretical analysis in order to prove the feasibility of the proposed method. This underpins the research aim, so it is critical that this is addressed in the revision.

[# PeerJ Staff Note: Please ensure that all review comments are addressed in a rebuttal letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. It is a common mistake to address reviewer questions in the rebuttal letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the rebuttal letter. Directions on how to prepare a rebuttal letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]

·

Basic reporting

no comment

Experimental design

Please add the experimental design in section "4.2 Experimental Environment and Procedures" and explain why the data split is 9:1 instead of the traditional 8:2.

Validity of the findings

In order to verify the validity of the experimental results, please compare the experimental results with the latest research.

Additional comments

This article is less like a scientific study and more like an experiment.

Cite this review as

Reviewer 2 ·

Basic reporting

no comment

Experimental design

The authors mention in the abstract and introduction that byte features are susceptible to mutation and obfuscation, and function call features are more robust. However, there is no experiment to prove this. The evaluation result shows that byte features perform equally well. I'd suggest that the authors evaluate some packed or obfuscated samples to demonstrate the effectiveness of process dumps.

On the other hand, ProcGCN is only evaluated against MalConv and Bozkir's method. Section 2 of the paper mentions other open-source works, such as DMalNet, that utilize similar approaches. I'd suggest that the authors evaluate more related work to show the robustness and effectiveness of ProcGCN.

Validity of the findings

The evaluation of the proposed approach relies solely on the dataset collected by Fang et al. Moreover, the authors randomly selected 2,000 samples from the dataset and retained only the ones whose call graph contains between 20 to 1,000 nodes.

First, this dataset is insufficient for a valid malware detection evaluation. With fewer than 900 malicious samples in the dataset, only about 90 malicious samples are used for evaluation. Even with a reported accuracy of 98% to 99%, the result is not convincing, given the size of the dataset. VirusTotal provides malware samples for academic research for free. There are many other services like VirusSign that provide malware feeds for free, so finding executable malware shouldn't be an issue.

Second, the authors need to justify why they randomly selected 2,000 samples and filtered out samples with more than 1,000 nodes. The original dataset is already small, with less than 8,000 samples, and picking out only one-third of the dataset doesn't seem reasonable.

Additional comments

Some minor issues:

1) I'm curious about the average percentage of external functions in a call graph. Since internal functions do not carry semantic information and each GCN layer can only propagate information to its nearest neighbor, the model may not perform very well if the external functions are not close enough, for example, more than four hops away.

2) The authors mention at line 293 that EXEs and DLLs are extracted from the dataset. How do the authors run DLLs in the Cuckoo sandbox? Is there a specific loader submitted to the sandbox?

3) At line 319, there is no need to convert the GDL format to DOT because IDA Pro is able to dump graphs in the DOT format directly.

Cite this review as

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.