Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on August 25th, 2023 and was peer-reviewed by 3 reviewers and the Academic Editor.
The Academic Editor made their initial decision on September 12th, 2023.
The first revision was submitted on September 19th, 2023 and was reviewed by 2 reviewers and the Academic Editor.
The article was Accepted by the Academic Editor on September 22nd, 2023.

Version 0.2 (accepted)

Jingzhe Wang · Sep 22, 2023 · Academic Editor

The authors have responded clearly to all comments and updated the manuscript accordingly. Thus, this paper is suitable for publication.

[# PeerJ Staff Note - this decision was reviewed and approved by Jyotismita Chaki, a PeerJ Section Editor covering this Section #]

Reviewer 2 · Sep 21, 2023

Basic reporting

The current version is well written. All of the concerns have been well addressed.

Experimental design

The research meets the aim and scope of the journal well.

Validity of the findings

The novelty of the article is clear. All data have been provided.

Additional comments

The article is suggested to be accepted in the current version.

Cite this review as

Anonymous Reviewer (2023) Peer Review #2 of "A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications (v0.2)". PeerJ Computer Science

Reviewer 3 · Sep 19, 2023

Basic reporting

The paper can be accepted in the present form

Experimental design

The paper can be accepted in the present form

Validity of the findings

The paper can be accepted in the present form

Additional comments

The paper can be accepted in the present form

Cite this review as

Anonymous Reviewer (2023) Peer Review #3 of "A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter - submitted Sep 19, 2023

Version 0.1 (original submission)

Jingzhe Wang · Sep 12, 2023 · Academic Editor

Minor Revisions

This manuscript proposes an approach to improve the effectiveness of knowledge distillation by reducing the size of ASR models. However, a number of revisons are necessary. Please revise this manuscript based on the raised comments. It will be sent out for peer-review again.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** PeerJ staff have identified that the English language needs to be improved. When you prepare your next revision, please either (i) have a colleague who is proficient in English and familiar with the subject matter review your manuscript, or (ii) contact a professional editing service to review your manuscript. PeerJ can provide language editing services - you can contact us at copyediting@peerj.com for pricing (be sure to provide your manuscript number and title). – PeerJ Staff

Reviewer 1 · Sep 6, 2023

Basic reporting

The paper presented a well-done introduction and an excellent survey for related work, which make it clear why speech recognition for air traffic control communications requires lightweight models and low latency.The proposed simple yet effective knowledge distillation method is described clearly and illustrated well with excellent figures.Extensive experimental results show that this method can effectively reduce the low latency of student models while maintaining the recognition accuracy.

Experimental design

This paper mainly introduces a lightweight speech recognition method for air traffic control communications, which combines target-swap knowledge distillation technology to improve the generalization ability of lightweight models while maintaining high recognition performance and low latency. This paper introduces the realization process of this method in detail, including two components of knowledge distillation: TKD and SKD.

Validity of the findings

Extensive experiments were designed and conducted by authors to validate the effectiveness of the proposed method in homogeneous and heterogeneous architectures, which adopted the latest neural networks, including Transformer and Conformer. They also took the SOTA method, like DKD, as a comparison.In the comparative experiment, the SER value of the method proposed by the author is lower than that of other methods, demonstrating its superiority.Meanwhile,The optimal knowledge extraction performance in their method reached 3.98%, achieving a balance between recognition accuracy and transcription latency.

Cite this review as

Anonymous Reviewer (2023) Peer Review #1 of "A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications (v0.1)". PeerJ Computer Science

Reviewer 2 · Sep 11, 2023

Basic reporting

The authors have proposed a method that can achieve high-speed transcription while maintaining a specific recognition rate, emphasizing transcription speed, against the background that miscommunication between ATCOs and pilots causes serious accidents. This research is in the critical field of aviation accident mitigation, and its contribution to society is immeasurable.

Experimental design

Based on the constructed Mandarin ATC communications dataset and following the provided implementation details, extensive experiments were designed and conducted thoroughly to validate the effectiveness of the proposed TSKD knowledge distillation method, including comparative experiments in homogeneous and heterogeneous architectures through swapping target class or top-k predictive information, parameter sensitivity experiments, and the effect of beam search size on recognition accuracy and transcription speed.

Validity of the findings

The authors have also appropriately described the comparison with previous studies in this paper, and there is no doubt about the effectiveness of the proposed method. With the assistance of TSKD, the developed lightweight ASR model reaches a good balance between recognition accuracy and transcription speed. This enables ATCOs and pilots to react promptly and efficiently, ultimately diminishing the potential hazards to flight safety linked to communication errors.
On the other hand, I judged the paper to be conditionally accepted after minor revisions because it needs an explanation of the core terminology of the method, and the credibility of its claims needs to be better. Below are the conditions for acceptance:
(1) The difference between TCKD and NCKD cannot be distinguished because there is no detailed definition or explanation of “target” or “target class”, which is the basis of the method proposed in this paper. Please add definitions of target or target class.
(2) The correlation between the experimental results shown in the tables and the result analysis should be further refined, and their logical correlation should be more compact and direct.

Cite this review as

Reviewer 3 · Sep 11, 2023

Basic reporting

This manuscript proposes an approach to improve the effectiveness of knowledge distillation by reducing the size of ASR models. In the author's method, elements in logits from the outputs of the teacher and student models corresponding to the target class at each time frame are swapped with each other, and the KL divergence between the probability distributions calculated from these modified versions of the logits is minimized via training. I appreciate its good survey of ATC speech recognition in Sections 1 and 2.

Experimental design

Furthermore, the proposed method was thoroughly evaluated through air traffic control (ATC) speech recognition experiments in homogeneous and heterogeneous architectures, parameter sensitivity experiments, and the beam search size’s effect on recognition accuracy and transcription speed.

Validity of the findings

The findings of this paper are correct and reliable. However, the logical relation between the experimental results and analysis should be polished to be more friendly and clear to readers. Besides, the “target class” notion should be enhanced further, which may mislead readers. Furthermore, authors are suggested to examine the grammar throughout the paper thoroughly when revising their paper.

Additional comments

Cite this review as

Download Original Submission (PDF) - submitted Aug 25, 2023

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History
A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications

Summary

Version 0.2 (accepted)

Jingzhe Wang · Sep 22, 2023 · Academic Editor

Reviewer 2 · Sep 21, 2023

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 3 · Sep 19, 2023

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.1 (original submission)

Jingzhe Wang · Sep 12, 2023 · Academic Editor

Reviewer 1 · Sep 6, 2023

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · Sep 11, 2023

Basic reporting

Experimental design

Validity of the findings

Reviewer 3 · Sep 11, 2023

Basic reporting

Experimental design

Validity of the findings

Additional comments

Publish for free

Five new journals in Chemistry

Review History A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications

Summary

Version 0.2 (accepted)

Jingzhe Wang · Sep 22, 2023 · Academic Editor

Reviewer 2 · Sep 21, 2023

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 3 · Sep 19, 2023

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.1 (original submission)

Jingzhe Wang · Sep 12, 2023 · Academic Editor

Reviewer 1 · Sep 6, 2023

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · Sep 11, 2023

Basic reporting

Experimental design

Validity of the findings

Reviewer 3 · Sep 11, 2023

Basic reporting

Experimental design

Validity of the findings

Additional comments

Publish for free

Five new journals in Chemistry

Review History
A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications