All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
There reviewers think that their concerns have been solved, and now I recommend accepting this paper.
[# PeerJ Staff Note - this decision was reviewed and approved by Yilun Shang, a PeerJ Section Editor covering this Section #]
no comment
no comment
Verified the data, preprocessing code and model training/validation code.
The authors should proofread their manuscript for further improvement as the reviewer's suggestions.
The revised manuscript exhibits a marked improvement in language clarity and professional English, ensuring accessibility to readers with relevant backgrounds. The addition of recent literature notably enhances the context of DNS security and encrypted traffic analysis, ensuring the work's relevance and currency. The structure, including figures and tables, has been refined for better clarity and comprehension. The manuscript remains self-contained, with results and methodologies clearly supporting the stated hypotheses. The expansion on formal results and definitions strengthens the paper’s academic rigor, though further enhancement in detailed proofs could benefit future work.
The authors' response and enhancements significantly address the concerns previously raised, demonstrating a strong commitment to originality, methodological rigor, and the scientific inquiry's standards. The inclusion of a comparative analysis in Table 5 effectively highlights the study's unique contributions, particularly its innovative two-layered classification approach and the diversity of its dataset. This comparison not only establishes the manuscript's originality within the scope of encrypted DNS traffic classification but also articulates its advancements over prior work.
The expanded discussion on the research question and knowledge gaps further strengthens the study's foundation, providing a clear contextual framing of the issues addressed and how the proposed methodology fills existing gaps. By detailing the problem of encrypted DNS traffic classification and the limitations of current methodologies, the authors convincingly position their work as a significant advancement in the field.
The acknowledgment of the investigation's rigor and the elaboration on the methodological details for replicability exemplify the authors' dedication to transparency and scientific rigor. The detailed explanation of model evaluation criteria, the rationale behind metric choices, and the thorough hyperparameter tuning process enhance the study's replicability and underline its methodological soundness.
Overall, the revisions and responses effectively address the initial concerns, significantly bolstering the experimental design's strength. The manuscript now presents a well-justified, original, and rigorously executed study that aligns well with the journal's emphasis on cutting-edge research in computer science and cybersecurity.
The authors have adeptly addressed the concerns regarding the validity of their findings, significantly enhancing the manuscript. The inclusion of Table 5 and the detailed introduction of the two-layered classification approach showcase a commendable effort to compare their work with existing studies, illustrating the study’s unique contributions and advancements clearly. Their methodical approach to creating a diverse dataset and employing novel machine learning techniques enriches the field of network security. The conclusion ties back effectively to the research question, demonstrating the practicality of machine learning in analyzing encrypted DNS traffic. These enhancements not only align with the original feedback but also significantly bolster the study's contribution to the domain, presenting a robust case for the practical applicability and novelty of their research within the broader context of network security and machine learning.
1. In the Abstract, specifically from lines 31 to 51, the description of the results could benefit from further refinement for greater clarity. Ensuring that the key findings are articulated in a clear and concise manner would greatly enhance the reader's comprehension of the study's outcomes.
2. Regarding the content in lines 166 to 172 and lines 173 to 190, there appears to be a redundancy in the discussion of the study's contributions. It may be beneficial to consolidate these sections to provide a more cohesive and streamlined presentation of the research contributions.
3. A minor formatting suggestion for line 173: please consider inserting a space after the comma in the phrase 'This research makes several…, the applications…' for improved readability.
no comment
Table 5 provides clarity on the issue of ambiguous contributions.
The reviewers have given detailed comments, and the author should revise their manuscript according to these comments. Notably, both reviewers think that the current English writing is poor, and the author should improve their writing and polish their paper. Additionally, the reviewers mentioned that the method and experiment parts are not clear, and the author should introduce their model more logically.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** The Academic Editor has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
1. Clarity and Language Use. The manuscript is written in clear, professional English, facilitating ease of understanding. The technical jargon is appropriate for the field, and complex concepts are adequately explained, making the content accessible to readers with a relevant background.
2. Literature References and Field Context. The paper provides a thorough review of existing literature, offering sufficient background and context in the field of network security and machine learning. However, it could benefit from more recent references to reflect the latest advancements in the domain of DNS security and encrypted traffic analysis.
3. Article Structure, Figures, and Tables. The structure of the article adheres to professional standards, with well-organized sections including the introduction, methodology, results, and conclusion. Figures and tables are appropriately used to illustrate key points and data. However, some figures/tables could be better labelled or explained for clarity.
4. Self-Containment and Relevance to Hypotheses. The manuscript is self-contained, providing all the necessary information to understand the research without needing external resources. The results are relevant and directly address the stated hypotheses. The methodologies used are adequately justified, and the results are discussed in relation to the hypotheses.
5. Formal Results, Definitions, Theorems, and Proofs. The formal results are presented with clear definitions of all terms and theorems. However, the manuscript could benefit from more detailed proofs and a deeper analytical discussion of the results. The application of machine learning models in network traffic classification is well-explained, but further elaboration on the mathematical underpinnings could enhance the paper's academic rigour.
1.Originality and Alignment with Journal's Scope. The study introduces an innovative approach to using machine learning for network traffic classification, particularly focusing on encrypted DNS traffic. This aligns well with the journal's emphasis on cutting-edge research in computer science and cybersecurity. However, to further underscore its originality, the study could compare its approach and results with existing methodologies, highlighting its unique contributions and advancements over previous work.
2.Research Question and Knowledge Gap. The research question—how effectively can machine learning models classify encrypted network traffic, especially in the context of DNS protocol security—is both relevant and significant. The paper claims to address a gap in knowledge regarding the application of various machine learning models to this specific type of network traffic. To strengthen this claim, the authors could discuss in more detail how their approach differs from or improves upon existing methods in the literature.
3.Investigation and Standards. The investigation's rigour is evident in the application of multiple machine learning models and the comparison of their performance.
4.Methodological Details and Replicability. The methodology section provides a good overview of the machine learning models used. However, to enhance replicability and scientific rigor, the paper could include (1) a Clearer explanation of the criteria for model evaluation, including why certain metrics were chosen (2)Information on any hyperparameter tuning or model optimization processes undertaken.
1.Impact and Novelty Assessment. The manuscript does not sufficiently assess the impact and novelty of its findings within the broader context of network security and machine learning. While introducing machine learning techniques for encrypted traffic analysis, it lacks a direct comparison with existing methodologies or a clear articulation of how this study advances the field.
Suggested Improvements. To meet the journal's standards, the authors should emphasize the unique contributions of their work. This could include a detailed comparison with prior studies, highlighting advancements or differences in approach, accuracy, or applicability. Additionally, encouraging meaningful replication by providing a clear rationale and demonstrating the benefit of this study to existing literature would strengthen the paper's impact.
2.Conclusions and Research Question Alignment. While conclusions are presented, they are not sufficiently linked to the original research question. The paper lacks a clear demonstration of how the results directly support the initial hypothesis, which is crucial for establishing the validity and relevance of the research.
Suggested Improvements: The authors should refine the conclusion section to directly tie back to the original research question and hypothesis. Conclusions should be clearly supported by the results, with explicit statements on how the findings contribute to the existing body of knowledge. Limiting the conclusions to what is directly supported by the results will improve the paper’s credibility and alignment with academic standards.
1. To enhance readability, it is recommended that the paper addresses grammatical errors and completes incomplete sentences. For example, lines 96 and 121.
2. It would be beneficial to include citations for the literature review concerning other machine learning models, especially those with reported accuracies. For instance, a citation supporting the claim of 100% accuracy mentioned in line 130 is necessary.
The paper discusses the performance of eight machine learning models, leading to ambiguity about its primary objective - whether it aims to compare various models or to identify the optimal model for classifying network traffic data. It is advisable for the authors to concentrate on 2-3 representative models, delving deeper into their accuracy, precision, and recall to provide a more focused and clear analysis.
1. The distinctiveness of the paper's contribution is not evident, considering the existence of related studies that have reported superior performance metrics for LGBM and XGBoost.
2. The sections detailing experimental results and discussion require substantial enhancement. A critical performance metric for machine learning models, the rate of false positives/negatives, is not discussed even though the paper includes relevant figures on precision and a confusion matrix.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.