All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
I hope this message finds you well. After carefully reviewing the revisions you have made in response to the reviewers' comments, I am pleased to inform you that your manuscript has been accepted for publication in PeerJ Computer Science.
Your efforts to address the reviewers’ suggestions have significantly improved the quality and clarity of the manuscript. The changes you implemented have successfully resolved the concerns raised, and the content now meets the high standards of the journal.
Thank you for your commitment to enhancing the paper. I look forward to seeing the final published version.
[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]
no comment
no comment
no comment
The authors have addressed all the concerns raised in the previous review. In my opinion, the article now meets the journal criteria and should be accepted as is.
Thank you for submitting your manuscript to PeerJ Computer Science. After careful review, the reviewers have raised some concerns regarding the methodology and experimentation that need to be addressed before we can proceed with the publication.
We kindly request that you revise your manuscript in light of the reviewers' comments and make the necessary adjustments. Please also provide a detailed response letter addressing each of the reviewers' suggestions and observations.
We are confident that, with these revisions, your manuscript will be considered for publication.
Thank you again for your contribution, and we look forward to receiving your revised submission.
**PeerJ Staff Note**: Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. 
**Language Note**: The review process has identified that the English language and grammar must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
The paper proposes MLPruner, a novel method for pruning convolutional neural networks (CNNs) by automatically learning binary masks for each filter during training. Unlike traditional approaches, MLPruner employs a learnable mask in conjunction with a straight-through estimator (STE) to guide pruning decisions without compromising weight optimization.
The manuscript uses largely professional English and is mostly clear in its exposition. However, there are persistent grammatical issues, typos ("MoibleNet", "dicards", "Digtial", etc.), and inconsistent formatting that compromise readability and professionalism.
The paper provides an overview of the existing literature on CNN pruning, clearly positioning its contribution within the two dominant paradigms: importance score-based pruning and regularization-based pruning. However, as pointed out by the previous reviewer they are all “old” works. You cite only one paper from 2023 and one from 2024, the rest are all articles at least 5 years old. To give strength to the work there is a need to write a more consistent and up-to-date related literature.
The method is within the scope of PeerJ CS and presents a pruning method that does not require weight regularization. However, the originality of the proposed "learnable mask + STE" framework is questionable, as similar techniques (e.g., soft masks, binary gating, Gumbel-softmax approximations) exist in prior work.
On the other hand, the use of fixed thresholds and hard-coded pruning rates is a significant weakness.
The experiments are extensive, covering CIFAR-10/100 and ImageNet across multiple architectures (VGG, ResNet, GoogLeNet, MobileNet). However, the reported results are limited to top-1 accuracy, FLOPs, and parameter counts, with no standard deviations or confidence intervals provided.
The conclusions are directly linked to the proposed hypothesis, i.e. pruning without weight penalties is possible. However, claims of "unbiased importance estimation" and "better generalization" are unsubstantiated by theoretical or empirical evidence beyond accuracy metrics.
Last but not least, the limitations section is weak and only briefly touches on threshold tuning and fixed CNN architectures. No mention is made of computational cost, scaling to transformers, or generalization to tasks beyond classification.
The proposed idea is sound and empirically promising, but the manuscript does not meet the standards of rigor and clarity required for acceptance.
To meet these standards, in my opinion, the author must:
- Rewrite the related literature with recent works.
- Conduct more rigorous removal and statistical reporting.
- Improve writing quality.
- Provide deeper theoretical framing or formalize mask importance estimates.
I hope this email finds you well. After a thorough review of your manuscript by the assigned reviewers, I would like to inform you that, while there is potential in your work, several significant concerns have been raised regarding the experimentation and methodology.
The reviewers have pointed out that certain aspects of the experimental setup lack sufficient clarity and justification. In particular, they believe that more detailed explanations and stronger validations are necessary to support your findings. Additionally, methodological improvements have been recommended to ensure the robustness and reliability of the results.
In light of these concerns, we are requesting major revisions to the manuscript. We kindly ask that you carefully address each of the reviewers' comments in your revised submission, providing additional detail and supporting evidence where necessary.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.
Here is a detailed review of the paper "MLPruner: Pruning convolutional neural networks with automatic Mask Learning", along with constructive revision comments to improve its quality for the journal publication.
This manuscript addresses an important and timely topic.
The revised version is improved.
However, there are several areas where the manuscript could be improved to enhance clarity, academic rigor, and impact.
Pay attention to cases in the title.
Clarify the core research questions.
Explain future work details.
The manuscript would benefit from careful editing. 
Some recent and relevant papers ought to be discussed. Consider high-impact, peer-reviewed sources from the recent years, such as:
Neural network developments: A detailed survey from static to dynamic models
Computers and Electrical Engineering 120, 109710
The topic is highly relevant, and the authors demonstrate familiarity with the domain. However, to meet the standards of a high-impact journal, the paper should more clearly articulate its unique contribution and provide deeper insights.
In Figure 1, please label the lines.
Describe the methodology used to select or evaluate the solution.
For better reproducibility of the research, explain the technical details (e.g., hyperparameters, layer-wise pruning rates, STE implementation, STE threshold, mask initialization).
Please explain missing critical metrics (e.g., parameters for FPGM in Table 3).
Please explain more details about the pruning rate determination method.
Provide sufficient technical depth on how these methods operate, their comparative strengths, or their practical implications in real world applications.
Please explain more details about filter importance.
Please explain more details about MLPruner and weight-penalty.
Discuss convergence properties of relevant machine learning.
Please explain more details why MLPruner outperforms EagleEye.
Explain own new insights, frameworks, or models. 
Clarify how this work contributes beyond existing literature.
Explain real-world limitations of own innovation.
I hope this email finds you well. After a thorough review of your manuscript by the assigned reviewers, I would like to inform you that, while there is potential in your work, several significant concerns have been raised regarding the experimentation and methodology.
The reviewers have pointed out that certain aspects of the experimental setup lack sufficient clarity and justification. In particular, they believe that more detailed explanations and stronger validations are necessary to support your findings. Additionally, methodological improvements have been recommended to ensure the robustness and reliability of the results.
In light of these concerns, we are requesting major revisions to the manuscript. We kindly ask that you carefully address each of the reviewers' comments in your revised submission, providing additional detail and supporting evidence where necessary.
This paper introduces a mask learning method for autonomous filter pruning, negating requirements for weight penalties. Specifically, it attributes a learnable mask to each filter. During forward propagation, the mask is transformed to a binary value of 1 or 0, serving as indicators for the necessity of corresponding filter pruning. In contrast, throughout backward propagation, it uses straight-through estimator (STE) to estimate the gradient of masks, accommodating the non-differentiable characteristic of the rounding function.
The experiments can be improved. Most of the baselines are published before 2020. There are very few baselines that are published after 2020, such as RGP (Chen et al., 2023) and White-Box (Zhang et al., 2022).  The most recent baseline is published in 2023. Without strong baselines, it is hard to demonstrate the superior performance of the proposed method. There are more recent baselines.  It is better to compare with more recent baselines on pruning such as [R1].  For example, for Resnet-50 on Imagenet, under 2G FLOPs after pruning, [R1] can achieve 76.7 Top1 accuracy, which is much higher than the proposed method with 75.62 accuracy. Even for baselines before 2020, there are some strong baselines such as [R2]. For example, for Resnet-50 on Imagenet, under 2G FLOPs after pruning, [R2] can achieve 76.4 Top1 accuracy, which is much higher than the proposed method with 75.62 accuracy. It is better to compare with these strong baselines. 
It seems to cost a lot of training efforts. For example, it needs to use 30 epochs for mask training and then another 300 epochs training to restore the accuracy. I am not sure whether on large datasets such as ImageNet, it still needs so many epochs to finish the pruning. The pruning efforts is very heavy. In [R1], on ImageNet, it only needs 10 epochs to search mask and 50 epochs finetuning to achieve final accuracy.  It is better to have a detailed discussion or comparison on the training efforts. 
[R1] Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization, IJCAI 2022.
[R2] EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning, 2020
[# PeerJ Staff Note: It is PeerJ policy that additional references suggested during the peer-review process should *only* be included if the authors are in agreement that they are relevant and useful #]
The novelty of the proposed method may be limited. It uses a mask to select the pruning channels and the STE method to train the mask. There are some existing works using the same techniques, such as [R1, R3]. Both [R1, R3] and the proposed method adopt a mask to learn pruning channels and use STE to train the masks. The techniques are almost the same. It is better to discuss the difference between the proposed methods and [R1]. The technical contribution may be limited.
Although the soft mask approach is applied to each weight, during the computation, it simple adds up the soft mask weights in each filter and compare the sum to determine the importance of each filter. It still performs filter pruning with an importance  metric (the sum). This is the same as the methods in [R1,R3] to assign one trainable parameter for each filter or other pruning granularity such as layers or blocks. There are no distinct difference between the proposed method and [R1,R3]. 
It mentions in the abstract and introduction that 'existing methods are based on heuristically designed pruning metrics or implementing weight regulations to penalize filter parameters during the training process. Nevertheless, human-crafted pruning criteria tend not to identify the most critical filters' and the proposed method can 'rectify these obstacles'. However, in the proposed method, when selecting which channels to be pruned, it still follows traditional methods to use the magnitude or L1 norm of channels to determine whether the channel is important or not, see equation (5) and (6). It claims to rectify the heuristically designed pruning metrics, but it still follows 'heuristically designed pruning metrics' in the pruning. It is very typical to use the L1 norm as a metric in pruning. The proposed method does not seem to be very different from traditional ways.
[R1] Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization, IJCAI 2022.
[R3] Structured Pruning Learns Compact and Accurate Models
The authors propose a mask-based prune methodology for DNN with the goal of reducing the complexity of the model while maintaining its performance. The solution, which can be applied even withouth penalty l2, has been tested on a variety of models, including VGGNet-16, GoogLeNet, ResNet and MobileNet, achieving substantial reduction of the model size (between 50% and 90%) while maintaining the accuracy substantially unvaried.
The paper suits the topic of the journal and its goals are clearly depicted in the introduction.
The authors provide a throughout analysis of the solution and compare it with previous baseline publications, demonstrating the efficacy of their method.
The authors provide the source code of the experiments for complete reproducibility.
The methodology seems sound and, the results obtained are well supported by the experimental evaluations.
- The paper is well written and easy to follow.
  The several methods being compared are well explained and clear.  
  A couple of typos are still present (e.g., "MoibileNet"), therefore a last revision by the authors before the final submission is suggested.  
- The comments from the previous reviewers seem to have been completely addressed
- My suggestion for the current state of the paper is accept
Clarify own innovation and impact, such as to explain how Algorithm 1 on page 6 is better than existing relevant solutions.
Lack 2024 references. 
The paper should discuss more relevant recent work, such as:
 O-WCNN: an optimized integration of spatial and spectral feature map for arrhythmia classification
 
Complex & Intelligent Systems 9 (3), 2685-2698
Provide formal proof of main claims, such as: “mitigating the significant computational complexity and parameter burden” and “learned masks aptly reflect the significance of corresponding filters” in the abstract.
Please clarify originality and advantages of own solution comparing with state of art relevant solutions.
Rational choice of methods and choice of evaluation criteria, such as to explain why CNN is selected, why FLOPs is selected with good technical details. 
The paper should explain from computing perspective how the solutions select parameters and how initial values of parameters were assigned.
Explain validity and generalizability of results, such as “MLPruner reduces 71.9% of FLOPs and 89.1% of parameters from VGGNet-16 on CIFAR-10, with negligible Top-1 accuracy drop of 0.03%.” in the abstract, Table 1, 2, 3, 4, 5, 6, 7, 8; Figure 2.
in this paper the author present MLPruner: Pruning convolutional neural networks with
automatic Mask Learning. this is a good contribution
the experimental session is presented by the author with a good presnetation
the validity of results is well presented in this paper
the paper is ready to be accepted
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.