A hybrid algorithm for literature classification in emerging interdisciplinary fields: A case study on UAV logistics
Abstract
The rapid expansion of scientific literature poses significant challenges for accurately mapping emerging interdisciplinary fields, such as Unmanned Aerial Vehicle (UAV) logistics, characterized by evolving terminology and fluid boundaries. This study develops a hybrid text classification algorithm to automate the identification of relevant publications, surpassing limitations of conventional keyword-based searches and standalone pre-trained language models. Utilizing a manually curated dataset of 5,636 articles retrieved from Web of Science, the approach integrates semantic representations from a fine-tuned MPNet model, domain-specific keyword-group cosine similarities, and discriminative n-gram counts within a Random Forest classifier. Validated through five-fold cross-validation and ablation studies, the framework achieves a positive-class F1-score of 0.9014, demonstrating superior precision (0.9250) and recall (0.8790) compared to baselines, including SciBERT (F1: 0.7989) and MPNet alone (F1: 0.8723). These results highlight the efficacy of multi-feature fusion in distinguishing relevant UAV logistics publications from irrelevant ones. The methodology offers a generalizable solution for literature mapping in dynamic domains.