Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Knowledge distillation in deep learning and its applications

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on November 2nd, 2020 and was peer-reviewed by 2 reviewers and the Academic Editor.
The Academic Editor made their initial decision on December 4th, 2020.
The first revision was submitted on February 11th, 2021 and was reviewed by 2 reviewers and the Academic Editor.
A further revision was submitted on March 5th, 2021 and was reviewed by the Academic Editor.
The article was Accepted by the Academic Editor on March 16th, 2021.

Version 0.3 (accepted)

Sebastian Ventura · Mar 16, 2021 · Academic Editor

Accept

The paper is ready for publication. No more modifications are required

Download Version 0.3 (PDF) Download author's response letter - submitted Mar 5, 2021

Version 0.2

Sebastian Ventura · Feb 19, 2021 · Academic Editor

Minor Revisions

Reviewers appreciate the improvements done in the paper, and so do I.

Reviewer 1 · Feb 19, 2021

Basic reporting

This overview paper studies techniques which transfer the knowledge acquired by large deep learning models to smaller models, which can then be used in embedded and mobile devices.

The level of English is adequate, apart from some minor grammatical errata that should be edited out: e.g. "The main objectives of this work" -> "The main objective of this work"; "Also, the paper discuss" -> "The paper also discusses"; "deep learning models use to run" -> "deep learning models are used to run" (or just "typically run").

The introduction and background sections have been sufficiently improved.

Experimental design

This study fits within the scope of the journal and there is no recent peer-reviewed review of the topic, to my knowledge. The overall structure of the paper has been improved and is now easier to follow. Diagrams have been added to complement method descriptions.

Validity of the findings

no comment

Additional comments

Overall, the paper has received many improvements and the contents are now well organized and present the whole picture notably better. The only modification I would recommend is a quick revision for grammar mistakes such as those I marked above, for greater clarity. Apart from that, I would consider that this overview reaches publication quality.

Cite this review as

Anonymous Reviewer (2021) Peer Review #1 of "Knowledge distillation in deep learning and its applications (v0.2)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.474v0.2/reviews/1

Reviewer 2 · Feb 19, 2021

Basic reporting

As suggested, the authors have extended the background to include some of the concepts used along the paper. I would also move the definition of online and offline distillation (including figure 2) that now appears at the beginning of section 5 (lines 173-176) to the background.

The authors have also included a new section summarizing the applications, as requested. This section contributes making the paper more complete.

New figures are quite useful to understand the background concepts and the categories used to classify the papers.

The manuscript still contains some grammar mistakes (a few are listed below), so proof-reading is highly recommended before publication.

- Section 1: The main objectives of this works is => are
- Section1: Also, the paper discuss => discusses
- Section 3: It’s purposes => its purposes
- Section 5: the two sub-category => subcategories
- Section 6: deep learning models use to run => are usually run ?
- Section 6: To be practically in use => To be of practical use
- Section 6: To be low latency => To have low latency

Experimental design

I am not fully satisfied with the answer given by the authors about the survey methodology. Even if they do not want to conduct a systematic literature search, the process followed to find and select the papers should be better explained in the manuscript. It seems that the survey is focused on recent works not included in previous surveys, so the covered period of time should be given. The names of the journals and conferences considered as “relevant”, as well as the minimum citation count, should be reported as well. Even though these criteria might not be valid for a systematic review, the reader has the right to know how the authors choose papers. Otherwise, the “overview” of the area is strongly biased by the authors’ interest on certain papers, but the reader is not aware of it.

The new organization of the survey section has greatly contributed readability.

Validity of the findings

The authors have successfully addressed my comments about the validity of findings.

Additional comments

None.

Cite this review as

Anonymous Reviewer (2021) Peer Review #2 of "Knowledge distillation in deep learning and its applications (v0.2)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.474v0.2/reviews/2

Download Version 0.2 (PDF) Download author's response letter - submitted Feb 11, 2021

Version 0.1 (original submission)

Sebastian Ventura · Dec 4, 2020 · Academic Editor

Major Revisions

Pay special attention to the topics related to the experimental part of the paper and the validity of findings.

Reviewer 1 · Nov 29, 2020

Basic reporting

This review studies techniques which transfer the knowledge acquired by large deep learning models to smaller models, which can then be used in embedded and mobile devices.

The level of English is more than adequate, explanations are clear and accessible to a broad range of readers.

It mostly respects the PeerJ standard structure, with a few extra sections that makes sense for the contents of the overview. However, readers may appreciate some more subdivisions in the Survey section, which only has one subheading and one sub-subheading (I believe those should be at the same level instead, this may be a formatting errata). The acknowledgements section includes funders.

The review applies and is accessible to any deep learning practitioner, including those who may not be specialized in the topic but may want to embed a certain level of intelligent behavior in a small device, a situation where knowledge distillation techniques are of interest. This field has been reviewed recently but none of those reviews are published on a peer-reviewed journal as of now, so this would apparently be the first review of the topic in a reliable source, since the topic itself is also very recent.

The introduction of the manuscript introduces the concepts appropriately, but I think it is missing some examples as to what tasks can be achieved with deep learning in embedded/mobile devices (e.g. fitness tracking, sensor data compression?), since the main justification for knowledge distillation is the need of smaller deep learning models, but there is no explanation for what problems these models may solve.

Experimental design

The content of the article is well within the aims and scope of PeerJ Computer Science. The described methodology in order to collect studies and results seems appropriate and rigorous. It is also systematic, since it introduces an objective metric for the fitness of different algorithms to the problem, which takes into account the reduction in size as well as the preservation (or even improvement) of accuracy. The value of this metric is lower as performance in both aspects improves. The metric is relative to the sizes and accuracies of the models, and does not directly depend on the data used, but it is computed using the metrics reported by the original papers, so I am unsure about its ability to compare those models. The authors could justify briefly the level to which this metric is independent of the datasets used.

The survey seems diverse and comprehensive, all methods are sufficiently described and the explanations are put together well, including detailed information about the experiments and results of each study. There is, however, little to no visual aid to complement the textual explanations. I think a simple diagram outlining the main components of a deep learning-based knowledge distillation model (i.e. teacher, student, the flow of data and weights, or how the student is trained) would be very helpful to give the reader an intuition on what all these proposals have in common.

Cited sources are reliable in general, either from reputable journals and conferences or, at least, well-known papers on ArXiv.

Validity of the findings

The discussion of the results is sound, and several guidelines are provided on how to improve works in the topic. Some possible future directions are also mentioned and appropriately cited. The conclusions summarize the manuscript correctly and attempt to guide the novel reader on how to use these models.

Additional comments

In summary, my overall opinion of this paper is very good, but I believe some improvements could be made that would make it easier to read and comprehend. My suggestions are as follows: to extend the introduction with applications of deep learning in embedded devices, better subdivisions of the Survey section, and a diagram or two explaining the common points of the inner workings of these models.

Cite this review as

Anonymous Reviewer (2021) Peer Review #1 of "Knowledge distillation in deep learning and its applications (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.474v0.1/reviews/1

Reviewer 2 · Dec 4, 2020

Basic reporting

The introduction could include some additional sentences to explain the main contributions and findings of the survey.

Section numbers in the last paragraph of the introduction do not appear.

For a survey, the background could be more formal, introducing key concepts and definitions. The authors could also detail the categories or perspectives for the survey analysis, such as inputs, algorithms, distillation/compression approaches, outputs, etc.

The title mentions “applications”, so I would expect a specific section summarizing current applications and others the authors suggest could be explored in the future. Some information is given, e.g., used datasets in each paper, but a section from the application perspective could be more practical for readers interested in particular domains.

Some specific sentences that authors should clarify are:
- Section “survey”, line 136. Did the authors exclude papers not presenting evaluation metrics or they were only discarded from the comparison?
- Section “survey”, line 142. Every neural network -> every deep neural network

Experimental design

The survey methodology to search, select and summarize the papers should be improved. The authors only use one source (Google Scholar), so many relevant papers could be missing. It is not clear if the search strings are independent or not. The number of papers found, filtered out and finally selected should be also indicated. Usually, exclusion and inclusion criteria are established to clearly state the reasons why papers are discarded and selected. Current quality criteria seem pretty subjective, i.e., which are minimum acceptable citation count, or which are the “relevant” journals and conferences. All this information is necessary for replicability.

Reporting of each paper is quite complete, but it is not easy to understand how the authors have organized the paragraphs of each category (soft labels, transformation). Both sections are large to read, so the authors could think if a subdivision would fit, e.g., based on the application, specificity (agnostic or depend on network architecture), purpose of the knowledge distillation process…

Validity of the findings

The authors propose a metric to compare knowledge distillation techniques, but it is not evaluated for any of the surveyed techniques. Having a new metric could be very useful for researchers and adding a short study showing how it is computed and interpreted for a subset of techniques would add value to the paper.

The authors compare and discuss the distillation scores obtained by different techniques as reported in the original publications. However, it is not clear if all these techniques are comparable, i.e, do they comprise the same input deep learning? I guess not, so averaging or comparing achieved reduction and accuracy improvement is a bit risky. The authors could try to extract some common behaviors among techniques depending on the targeted architecture, dataset/application, etc.

Additional comments

None.

Cite this review as

Anonymous Reviewer (2021) Peer Review #2 of "Knowledge distillation in deep learning and its applications (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.474v0.1/reviews/2

Download Original Submission (PDF) - submitted Nov 2, 2020

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Knowledge distillation in deep learning and its applications

Summary

Version 0.3 (accepted)

Sebastian Ventura · Mar 16, 2021 · Academic Editor

Version 0.2

Sebastian Ventura · Feb 19, 2021 · Academic Editor

Reviewer 1 · Feb 19, 2021

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Feb 19, 2021

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.1 (original submission)

Sebastian Ventura · Dec 4, 2020 · Academic Editor

Reviewer 1 · Nov 29, 2020

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Dec 4, 2020

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
Knowledge distillation in deep learning and its applications