RoboLSTM-IDS: multi-dataset evaluation of deep learning framework for UAV network
- Published
- Accepted
- Received
- Academic Editor
- Ankit Vishnoi
- Subject Areas
- Artificial Intelligence, Computer Networks and Communications, Security and Privacy
- Keywords
- UAV, IDS, Anomaly detection, RoboLSTM, Intrusion detection, RoboLSTM-IDS, Multidataset FRAMEWORK, Cyber physical attacks, UAV classical attacks
- Copyright
- © 2026 Attaullah et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
- Cite this article
- 2026. RoboLSTM-IDS: multi-dataset evaluation of deep learning framework for UAV network. PeerJ Computer Science 12:e3500 https://doi.org/10.7717/peerj-cs.3500
Abstract
The growing deployment of uncrewed aerial vehicles (UAV) in autonomous and networked missions has heightened their exposure to both cyber and cyber-physical attacks, underscoring the need for intelligent and lightweight intrusion detection systems (IDS) solutions. This study introduces RoboLSTM-IDS, a deep anomaly-based framework that combines robust feature engineering with temporal sequence modeling for UAV network security. Leveraging Robust Optimization-Based Tabular Feature Engineering (ROBOTa), a robust optimization-based feature selection technique—the system extracts stable, high-impact features from complex UAV telemetry and communication data. These are modeled using a Long Short-Term Memory network to capture sequential attack dynamics. Comprehensive experiments conducted on five benchmark datasets, including real-world UAV cyber-physical data (T-ITS), CICIDS-2017, UNSW-NB15, and their CTGAN-augmented variants, demonstrate that RoboLSTM-IDS consistently outperforms traditional machine learning and deep learning baselines. It achieves up to 99.62% accuracy and 0.997 AUC, while maintaining low false positive rates and real-time execution performance. Unlike conventional IDS models that are computationally heavy, proposed model achieves a 6× smaller model size, 3× lower memory footprint, and significantly reduced inference latency. These results confirm RoboLSTM-IDS as an effective and scalable IDS solution tailored for next-generation UAV ecosystems.
Introduction
Uncrewed aerial vehicles (UAVs), commonly known as drones, have become central to a broad spectrum of applications ranging from military surveillance and disaster response to smart agriculture, logistics, and environmental monitoring. Their versatility, mobility, and capacity for autonomous decision-making make them highly attractive for real-time operations in both civilian and defense sectors. With the integration of sensors, actuators, and onboard processors, UAVs are increasingly forming complex cyber-physical systems that support intelligent mission execution over wireless and often vulnerable communication links (Ceviz, Sen & Sadioglu, 2024; Nabi et al., 2024; Huzaifa et al., 2025).
A standard UAV architecture includes four major subsystems: (i) the flight control unit, which maintains flight dynamics and stability; (ii) sensing and perception modules (e.g., global positioning systems (GPS), intertial measurement units (IMU), light detection and ranging (LiDAR), camera systems); (iii) a communication module enabling command and telemetry exchange with ground control stations (GCS) or peer UAVs; and (iv) the processing and decision-making layer, typically supported by edge or onboard computational units (ref: Wei, Ma & Sun, 2024; Adil et al., 2023; Hassler, Mughal & Ismail, 2024).
As in Fig. 1 (recreated from Sihag et al. (2023)), at the physical and communication level, the architecture comprises several UAVs forming a drone network capable of inter-drone communication. Each UAV is equipped with onboard sensors such as GPS, LiDAR, and other environmental sensors that aid in navigation, object detection, and mission-specific tasks. These UAVs receive positional data from GPS satellites and transmit/receive situational data through the Automatic Dependent Surveillance–Broadcast (ADS-B) system. A Ground Control Station serves as a centralized control and monitoring hub, maintaining a communication link with UAVs and receiving broadcasted data from ADS-B stations. The architecture also interfaces with edge/cloud computing infrastructure, allowing offloading of compute-intensive tasks such as real-time analytics, route optimization, or anomaly detection. The right-hand side of the architecture diagram showcases the hierarchical control flow. Modules pertaining to the GCS initiate the process by passing inputs to the Planning Layer for flight plan generation or update based on operational targets. The Flight Management Layer carries out task delegation to individual UAVs after interpreting and managing flight plans. Task instructions are transformed into task behaviors through the Control Layer before becoming executable commands. Control signals reach the Sensors and Actuators Layer from the Control Layer for interaction with physical environments (d’Ambrosio et al., 2025). The multiple components breakdown system enables modular operations and real-time adaptability and scalability for UAV swarm applications across critical missions like surveillance work and environmental monitoring and disaster relief.
Figure 1: Overview and architecture of a UAV system (Sihag et al., 2023).
The various UAV architectural layers assist in enhancing autonomous functionalities as they present openings to various cyber threats, which are directed to communication and control features (Sedjelmaci, Senouci & Ansari, 2016). The UAV network cybersecurity is a critical field of concern that needs urgent consideration. UAV systems are used in hazardous locations and this provides an opportunity to enemies to interfere in the operation management of systems. Some attack vectors that target UAVs include GPS spoofing attacks along with signal blocking and denial of service attacks and packet injection techniques (Mohammed, Fourati & Fakhrudeen, 2024). The attacks have undesirable impacts that minimize mission objectives alongside aircraft accidents as well as exposing classified information to unauthorized parties. The use of UAVs in conducting their tasks is limited in computing and energy that does not allow them to deploy conventional heavy security systems (Abu Al-Haija & Al Badawi, 2022).
Intrusion Detection Systems (IDS) are necessary to ensure protection of UAV activities since these vulnerabilities are an increasingly disturbing threat. IDS solutions keep track of systems and networks in order to identify anomalous activities that are signs of attacks. The application of the classical machine learning models and traditional IDS models based on fixed sets of rules shows significant issues in the UAV operating environment. The systems are highly inflexible and have inadequate data-generalization capabilities by having features that designers create when the threat patterns change.
The use of deep learning algorithms namely Long Short-Term Memory (LSTM) networks became the choice of algorithm to process sequence data, as it is capable of identifying temporal patterns. Applicability of LSTMs to identify dynamic patterns of UAV network traffic and system logs renders them the best choice in the detection of invisible as well as emerging anomalies in the operational settings (Whelan, Almehmadi & El-Khatib, 2022; Fossaceca, Mazzuchi & Sarkani, 2015). Despite their potential, LSTM performance is heavily influenced by the quality of input features. High-dimensional, irrelevant, or noisy features can hinder learning convergence, reduce model accuracy, and introduce overfitting (Tsao, Girdler & Vassilakis, 2022). Intrusion detection in UAV networks remains challenging due to high mobility and dynamic topologies. Traditional IDS methods are often ineffective in such environments.
To mitigate these challenges, we propose RoboLSTM-IDS, a novel hybrid framework that integrates LSTM-based deep learning with a robust feature optimization module called ROBOTa (Robust Optimization-Based Tabular Feature Engineering). Unlike conventional feature selection approaches that rely on static filters or simple ranking methods, ROBOTa employs optimization-based scoring to select features that are both discriminative and resilient. It considers model sensitivity, perturbation stability, and cross-feature interaction strength, leading to a robust, reduced feature set that enhances downstream LSTM performance.
The effectiveness of RoboLSTM-IDS is evaluated across five intrusion detection datasets relevant to UAV environments. These include CICIDS-2027, UNSW-NB15, a UAV traffic dataset (T-ITS), and two synthetically generated datasets using Conditional Tabular GANs (CTGAN). The evaluation demonstrates that RoboLSTM-IDS consistently outperforms traditional classifiers—including Random Forest, SVM, CNN, and vanilla LSTM—in terms of accuracy, recall, false positive rate (FPR), and area under the receiver operating characteristic (ROC) curve (AUC).
The main contributions of this article are as follows;
We propose and present RoboLSTM-IDS, a new hybrid architecture of deep learning, that is a combination of LSTM-based detection and a highly effective lightweight feature optimization pipeline.
We develop and come up with ROBOTa, an optimization-based feature engineering method which dynamically estimates and picks high-value features. RoboLSTM-IDS is 6× smaller than traditional IDS models, 3× less memory footprint, and much reduced inference latency, which means that it can be simulated onboard UAV deployment and is feasible.
We conduct a cross-dataset analysis to confirm the extrapolation of RoboLSTM-IDS to a wide range of attack types and data distributions.
The rest of the article will be organized in the following way. The Related Work section examines the related literature review on UAV intrusion detection systems, feature selection and sequence based learning. Methodology presents the proposed RoboLSTM-IDS framework, namely, the ROBOTa feature selection module, and the LSTM-based classifier. Experimental Setup presents the datasets, augmentation process, and the experimental settings. Results provides the analysis of the evaluation outcomes using various datasets of UAV intrusion detectors and statistical validation and comparison with the base methods. Discussion addresses the implication of the results, especially in the vastness of lightweight deployment and real-time applicability in UAV settings. Lastly, the article has a conclusion, which suggests possible future research.
Related work
This section provides an extensive literature review of the available Intrusion Detection System methodologies that can be used to improve UAV and Internet of Things (IoT) network security. It details the strengths and weaknesses of these systems, their performance in detecting and countering the threats in dynamic UAVs settings. The methods reviewed include machine learning, deep learning, hybrid IDS, federated models and designs based on reinforcement learning. We organize this discussion into four key areas: traditional ML-based models, deep neural architectures (especially LSTM and its variants), feature engineering strategies, and multi-dataset evaluations.
Initial IDS models for UAV and IoT relied heavily on classical machine learning algorithms. In Fu et al. (2023) employed SVMs to detect intrusions in UAV telemetry but reported low adaptability to evolving attack signatures. In Adil et al. (2023), Hadi et al. (2024), Bouhamed et al. (2021) authors used LSTM (in general without any feature engineering), random forest (RF) and k-nearest neighbor (KNN) on CICIDS and UNSW-NB15, achieving moderate accuracy but with high false positive rates and signs of overfitting in imbalanced datasets. Similarly, Hashesh et al. (2022) tested lightweight ML techniques on Botnet Internet of Things (BoT-IoT) and found them insufficient for detecting minority class attacks. These models, while computationally inexpensive, are static and unsuitable for dynamic UAV contexts.
The rise of deep learning has transformed IDS design, especially using temporal models like LSTM and GRU. Alzahrani (2024) implemented an LSTM-based IDS for UAV cyber-physical systems, achieving 94.2% accuracy on proprietary data, but showed weak cross-dataset generalization (Anwar et al., 2025). Sheela et al. (2024) highlighted issues of over-sensitivity to dominant features in UAV logs using LSTM. CNN-LSTM hybrids (Sheela et al., 2024), GRU-Autoencoders (Alzubi et al., 2022), and AE-LSTM (Xu et al., 2024) have improved detection performance but often at the cost of inference speed, making real-time UAV deployment infeasible. Recent studies like Liang et al. (2024) adopted federated LSTM for distributed UAV logging, but performance degraded under data heterogeneity.
Several hybrid systems have attempted to enhance generalization. AlKhonaini et al. (2024) combined CNN, RF, and DNN ensembles across CICIDS and NSL-KDD, yet suffered from large inference latency. AL-Syouf, Bani-Hani & AL-Jarrah (2024) proposed a DRL-based IDS with promising learning capacity but demonstrated instability in UAV environments with small, non-stationary batch updates.
Recent advances in intrusion detection for UAV networks highlight both opportunities and challenges. Federated learning-based IDS frameworks have been proposed to enhance UAV privacy and security in distributed environments, offering collaborative learning without direct data sharing (Ceviz et al., 2025). At the same time, UAVs have been shown to be vulnerable to adversarial manipulations, where deep learning-based models can be misled through carefully crafted perturbations (Tian et al., 2021). Beyond UAVs, the broader cyber-physical system literature has also demonstrated the risks of adversarial false data injection attacks. For example, the EVADE framework reveals how targeted adversarial manipulations can compromise state estimation in smart grids (Tian et al., 2024a), while LESSON introduces a multi-label adversarial attack strategy that undermines deep learning-based locational detection (Tian et al., 2024b; Qureshi et al., 2025). Overall, these studies indicate the significance of the need to develop sound, lightweight, and generalizable frameworks of IDS that will support the COVID of adversarial effects and will be viable in real-time UAV application.
Nevertheless, feature representation is an important factor in the robustness of the IDS. Popular ones are PCA, IG and RFE to reduce dimensions. These are however not able to explain feature stability to perturbation or deep interactions between features. The authors in Tlili, Ayed & Fourati (2024) reviewed that most failures of IDS were associated with poor feature selection pipelines, also Dewangan & Vij (2024) used LSTM with optimized features and reported better consistency, but only on a single dataset, Booij et al. (2021) used Bi-GRU and feature ranking, but performance declined drastically on the TON_IoT dataset (Tlili, Ayed & Fourati, 2023). These experiments underscore the importance of optimization-based feature selection algorithms to be used that can change with the dynamics of models and be used across datasets.
Lots of IDS frameworks are trained on single datasets, which is not always applicable in the real world. Tlili, Ayed & Fourati (2023) trained AE-CNN on NSL-KDD and CICIDS and obtained good results at the cost of long training times. Alwan et al. (2022), Hassler, Mughal & Ismail (2024) presented a transformer-based IDS and obtained high accuracy, but with a high dependence on the GPU. Sedjelmaci, Senouci & Ansari (2017) trained a lightweight CNN on edge devices, where the balance between complexity and recall was This inability to generalize, edge-feasibility, and robustness underlines the necessity of unified frameworks of IDS that are tested on a variety of datasets.
Despite these advances, there remains a critical need for IDS models that are lightweight as shown in Table 1, generalizable, and resilient to diverse feature behaviors across datasets (Hassler, Mughal & Ismail, 2023; Praveena et al., 2022; Attaullah et al., 2024). In response, we propose RoboLSTM-IDS, a novel hybrid IDS that combines the sequence modeling capability of LSTM with a robust feature optimization module ROBOTa. ROBOTa dynamically ranks and selects features based on model sensitivity, perturbation stability, and cross-feature interaction, thereby ensuring minimal overfitting and strong cross-dataset performance. We evaluate RoboLSTM-IDS across five benchmark datasets, including CICIDS-2017, UNSW-NB15, T-ITS, and two CTGAN datasets.
| Study | Model type | Dataset(s) | Type of IDS used | Limitations |
|---|---|---|---|---|
| Fu et al. (2023) | SVM | Custom UAV dataset | Signature-based ML IDS | Poor adaptability to unseen data |
| Adil et al. (2023) | RF, KNN | CICIDS, UNSW-NB15 | Anomaly-based ML IDS | Overfitting, high FPR |
| Hadi et al. (2024) | LSTM | CPS, NSL-KDD | Deep Learning (Time-Series IDS) | Weak cross-dataset generalization |
| Bouhamed et al. (2021) | LSTM-based DL | Surveillance Logs | Temporal Deep IDS | Feature dominance, imbalance issues |
| Hashesh et al. (2022) | Meta-review | CIC 2017-2018 | Comparative Survey (Mixed) | Lacks experimental benchmarking |
| Alzubi et al. (2022) | Deep Ensemble Model | CICIDS, NSL-KDD, BoT-IoT | Multi-Modal Ensemble IDS | High complexity, unsuitable for UAV edge deployment |
| Alzahrani (2024) | CNN-LSTM Hybrid | TON_IoT | Anomaly-based DL IDS | High training time, unsuitable for lightweight UAVs |
| Sheela et al. (2024), Booij et al. (2021) | Autoencoder + Classifier | BoT-IoT | Signature + Anomaly Hybrid IDS | Weak on zero-day detection, limited feature adaptation |
| Xu et al. (2024) | Federated LSTM | Edge UAV logs | Distributed Federated IDS | Poor convergence under data heterogeneity |
| Liang et al. (2024) | CNN-GRU | CICIDS-2018 | Deep Sequential IDS | No benchmarking across heterogeneous datasets |
| AlKhonaini et al. (2024) | Hybrid AE-LSTM | IoTID20 | Deep Anomaly IDS | Computationally intensive for real-time UAVs |
| AL-Syouf, Bani-Hani & AL-Jarrah (2024) | Feature-optimized LSTM | NSL-KDD | Optimization-based DL IDS | Dataset-specific tuning; lacks generalization |
| Tlili, Ayed & Fourati (2024) | Ensemble RF-CNN | CICIDS-2017 | Hybrid IDS | Requires preprocessing pipeline; slow inference |
| Dewangan & Vij (2024) | ML-Based Lightweight IDS | BoT-IoT | Anomaly-based ML IDS | Low precision for minority class |
| Ntizikira et al. (2023) | LSTM-Attention Model | NSL-KDD | Temporal IDS with attention | Overfits on repetitive sequence features |
| Booij et al. (2021) | Bi-GRU + Feature Ranking | TON_IoT | Feature-engineered DL IDS | Inconsistent F1-scores across datasets |
| Tlili, Ayed & Fourati (2023) | AE + CNN | NSL-KDD, CICIDS | Layered DL IDS | Long training cycles; edge impractical |
| Alwan et al. (2022) | Lightweight CNN | BoT-IoT | Edge ML IDS | Reduces complexity but sacrifices recall |
| Hassler, Mughal & Ismail (2024) | Transformer-based IDS | TON_IoT, NSL-KDD | Deep Attention IDS | Requires GPU support for deployment |
| Sedjelmaci, Senouci & Ansari (2017) | DRL-based Classifier | CICIDS-2018 | RL-based IDS | Unstable in small UAV batch tasks |
UAV-based ids approaches
IDSs in UAV environments can be broadly classified into three categories: Signature-Based, Anomaly-Based, and Hybrid approaches as taxonomy mentioned in Fig. 2. The taxonomy is divided into these IDS classifications but specifically focusing on DL based techniques. Signature-Based IDS rely on rule or pattern matching against known attack signatures. Anomaly-Based IDS include statistical techniques, traditional machine learning algorithms (e.g., support vector machine (SVM), Random Forest, Naïve Bayes), and deep learning models (e.g., LSTM, GRU, CNN, Autoencoders). Advanced deep learning architectures like CNN-LSTM and AE-LSTM enhance anomaly detection capabilities (Fossaceca, Mazzuchi & Sarkani, 2015). Hybrid IDS implements signature detection features together with anomaly detection capabilities to establish a sturdy system that detects threats in UAV environments. Each detection paradigm has its own unique combination of strengths and weaknesses regarding accuracy levels and adaptability parameters and computation availability. This section analyzes all IDS strategies focusing on anomaly-based IDS methods since they serve as the foundation for the proposed RoboLSTM-IDS architectural framework.
Figure 2: Classification of IDS for UAV focusing on DL based techniques.
Signature-based IDS
IDS based on signature detection matches system behavior along with network traffic patterns against attack signature databases for detection purposes. Embedded systems that operate UAVs benefit from these lightweight security systems because they maintain both high efficiency along with minimal resource requirements. The system rule-based detection tool Snort remains a classic example of matching known threat patterns in real-time for intrusion detection.
Zhang et al. (2018) proposed a UAV-specific signature-based IDS using SVM to classify known attack types. The system effectively detected established security threats yet struggled to recognize new modified attack methods. The core drawback of signature-based systems reveals itself when they demonstrate complete unawareness toward unknown zero-day threats. The frequent need to update signature databases becomes a challenge for these systems because highly mobile UAV environments contain rapidly changing threats.
Anomaly-based IDS
An anomaly-based IDS operates through defining normal patterns of behavior then identifying any unusual operational activity. The analysis technique proves ideal for UAV networks because their unpredictable mission-oriented communication patterns are challenging to specify beforehand. Anomaly detectors surpass signature-based systems by detecting new threats which enables them to act as crucial cyber-defense mechanisms of today.
Anomaly-based IDS implementations during their initial stages relied on statistical models in combination with clustering algorithms. Tan et al. (2019) applied machine learning classifiers like Random Forest and KNN to UAV telemetry data but encountered high false positive rates and instability under class imbalance. Ouiazzane, Barramou & Addou (2020) applied lightweight anomaly-based models to the BoT-IoT dataset and noted that while fast, they struggled with precision for minority attack classes.
In recent years, deep learning has significantly advanced the field of anomaly detection. Deep architectures can extract hierarchical patterns from high-dimensional data and capture temporal relationships, making them ideal for UAVs where behavior evolves over time. LSTM, CNN, GRU, and Autoencoder based IDS have been widely explored.
Dash et al. (2025) implemented an LSTM-based IDS for detecting DoS attacks in UAV-based cyber-physical systems, achieving high accuracy. However, the model struggled to generalize across datasets such as NSL-KDD, indicating a limitation in robustness. Bamber et al. (2025) used a CNN-LSTM hybrid model on the TON_IoT dataset and reported improved detection rates, but the model required significant training time and was not optimized for UAV edge deployment. Other models such as GRU-based Autoencoders (Narmadha & Balaji, 2025) and AE-LSTM hybrids (Abdulganiyu et al., 2025) showed promise but suffered from overfitting or were too computationally demanding for onboard use.
These findings point toward the need for anomaly-based systems that not only leverage deep learning’s potential but are also optimized for generalization and efficiency. The proposed RoboLSTM-IDS framework belongs to this category by combining LSTM’s temporal detection features with robust feature engineering capabilities of the ROBOTa module as the workflow is illustrated in Fig. 3.
Figure 3: Workflow of the proposed RoboLSTM-IDS framework.
Hybrid-based IDS
Hybrid IDS combine elements of both signature-based and anomaly-based detection to leverage the strengths of each. A signature-based module with fast operation detects known threats but requires a complex anomaly-based component to identify unknown threats and active attacks. The approach consists of two layers to maximize both detection accuracy and minimize false positive and false negative outcomes.
Ali et al. (2025) implemented a deep ensemble model combining CNN and Random Forest for multi-dataset evaluation. While detection rates improved overall, the system required a powerful backend and was unsuitable for real-time or low-power environments like UAVs. Kamal & Mashaly (2025) presented an Autoencoder Classical classifier hybrid that could handle known and unknown attacks, but its performance deteriorated under adversarial perturbations and it lacked explainability.
The hybrid model functions well with abundant resources but lacks the simplicity required for UAVs to handle it efficiently. RoboLSTM-IDS provides a necessary solution to security challenges because it delivers accurate anomaly-based protection through a resource-efficient system design.
Signature based IDS works fast and is economical in resource utilization but possesses low capabilities of detecting threats that are not recognized. The anomaly-based IDS offers very high adaptability to new threats, but with the application of deep learning, but may require complex setup procedures and generates false detection notifications. The hybrid models offer the best benefits of the conventional models but are inapplicable in real-time UAV operations because of complexity of operation. The RoboLSTM-IDS is a system that is in the deep IDS sector of anomaly based IDS. The system implements stable feature engineering algorithms powered by optimization principles for noise reduction alongside improved broad applicability for UAV operational settings.
Datasets
The effectiveness of building a reliable IDS for UAV networks depends heavily on dataset quality along with the specific characteristics that appear during the training and evaluation processes. Existing datasets in the field of intrusion detection have largely focused on either cyber-level network traffic or physical telemetry, but very few offer integrated datasets that include both. This section provides a comprehensive overview of prior datasets used in UAV-related IDS research, categorizing them based on their domain focus, followed by a comparative analysis and the rationale for selecting the specific datasets used in this study. So In this part, the related works categorized into two sections: IDS which is based on cyber and based on the cyber physical features, followed by a comparison.
Datasets with cyber features
Cyber-based datasets focus on network-level data such as packet flows, protocol behavior, and statistical metadata derived from network traffic. These datasets form the foundation of traditional IDS development and have been widely used in machine learning research for network security. Examples include NSL-KDD (Tavallaee et al., 2009; McHugh, 2000); CICIDS-2017 (Sharafaldin, Lashkari & Ghorbani, 2018; Chen, 2023), augmented CTGAN (Alabdulwahab et al., 2023; Xu et al., 2019), InSDN (Khanapuri, Sharma & Brink, 2022), and UNSW-NB15. Which are discussed in Chen (2023), Koroniotis et al. (2017), Moustafa et al. (2018), Keshk et al. (2017), Moustafa, Turnbull & Choo (2018), Moustafa, Slay & Creech (2018a, 2018b, 2018c).
Many researchers have built their IDS models using these cyber-only datasets (Hassler, Mughal & Ismail, 2024). For instance, Shrestha et al. (2021) used a deep belief network combined with particle swarm optimization on the CICIDS-2017 dataset, achieving superior performance over conventional neural networks. Zhang et al. developed a hybrid IDS using TCP/UDP traffic analysis and wavelet-based fractal modeling; however, their data was purely simulated. Several other studies applied classical and deep learning models (e.g., SVM, CNN, RNN) to datasets like NSL-KDD (Tavallaee et al., 2009) and CICIDS-2017.
Despite their utility, these datasets were not originally designed for UAV environments. They lack UAV-specific features such as flight patterns, mobility-induced packet variation, or control link protocols (e.g., MAVLink). Additionally, many of these datasets were created in traditional IT network infrastructures, making them poorly suited for aerial networks where context and timing are essential. As a result, models trained on these datasets may generalize poorly in real-world UAV scenarios.
Datasets with physical features
Physical-feature-based datasets represent another axis of IDS development, focusing on sensor data, flight telemetry, orientation vectors, and behavior-based anomalies. These datasets aim to detect intrusions based on deviations from expected UAV movement patterns or physical signatures.
Several studies have focused on such approaches. Authors in Keipour, Mousaei & Scherer (2021) used a simulated UAV flight log dataset that included attack and normal scenarios, while others like Chen (2023), Mohammed, Fourati & Fakhrudeen (2024) proposed statistical models for GNSS spoofing detection. More sophisticated works have used deep learning models like 1D-CNNs and DNNs to detect behavioral anomalies in UAVs based purely on telemetry signals, flight trajectories, and positional drift.
Although useful, such datasets are usually narrow. They do not usually have network-layer intrusion events and do not necessarily consider cyber attacks such as spoofed command injection, control hijacking, or de-authentication attacks. Also, most of them are not constructed on actual scenarios but virtual scenarios of UAV operations, making them less faithful to real operational IDS deployment.
Selected datasets
A side-by-side comparison of commonly used IDS datasets in UAV research reveals a clear gap: most datasets either focus on cyber-level attacks or physical anomalies—not both. Cyber datasets offer volume and variety but lack UAV-specific context. Physical datasets capture movement-related behavior but ignore networking threats. Table 2 summarizes the domain type, UAV applicability, covered attack types, and known limitations of representative datasets in the field.
| Dataset | Domain type | UAV-specific | Attack types | Limitations |
|---|---|---|---|---|
| NSL-KDD | Cyber | No | DoS, Probe, R2L, U2R | Outdated attack types; no UAV communication or telemetry data |
| CICIDS-2017 | Cyber | No | DDoS, Web, Infiltration, Heartbleed, Brute Force | Not UAV-specific; lacks real mobility or control-link context |
| UNSW-NB15 | Cyber | No | Exploits, Shellcode, Worms, Backdoor, Reconnaissance | No UAV-related protocol (e.g., MAVLink); lacks physical behavior modeling |
| BoT-IoT | Cyber (IoT-specific) | No | DoS, Theft, Reconnaissance, DDoS | IoT-focused; lacks UAV telemetry or wireless control data |
| TON_IoT | Cyber (IoT Logs) | No | Keylogging, Malware, Data theft | General IoT logs; lacks UAV context or aerial network representation |
| GNSS spoofing dataset | Physical (GPS signals) | Yes | GPS Spoofing | Focused only on location deception; does not include cyber attack vectors |
| AirLab UAV dataset | Physical (Flight logs) | Yes | Anomalies, actuator/sensor failures | No cyber intrusions; only physical flight anomalies |
| CTGAN | Cyber-Physical (Simulated) | Yes | DoS, Telemetry Injection, Spoofing | Fully simulated; lacks real attack complexity or noise |
| T-ITS | Cyber-Physical | Yes | De-auth DoS, Replay, FDI, Evil Twin | Testbed-based; limited scalability and diversity |
| UAV-GRID | Physical | Yes | Command Injection, Spoofing, DoS | Tailored to grid drones; not general UAV use cases |
To address these gaps, this study integrates five datasets—three general and two augmented—that collectively cover cyber, physical, synthetic, and a UAV-specific data domains. Each dataset is briefly described below. Table 3 presents a consolidated summary of these five datasets, including their domains, sources, attack types, and unique contributions to this study.
| Dataset | Domain type | Attack types | No. of records | No. of features |
|---|---|---|---|---|
| T-ITS | Cyber-Physical | De-authentication DoS, Replay, False Data Injection, Evil Twin | 10,000+ | 53 (16 physical + 37 cyber) |
| CICIDS-2017 | Cyber | Brute Force, Web Injection, Infiltration, DDoS, Heartbleed | 2.8 million | 80+ |
| UNSW-NB15 | Cyber | Exploits, Shellcode, Worms, Generic, Reconnaissance | 2.5 million | 49 |
| CTGAN (Augmented CICIDS) | Cyber (GAN-Augmented) | Heartbleed, Infiltration, Web Attack | 2.9 million | 80+ |
| CTGAN (Augmented UNSW) | Cyber (GAN-Augmented) | Worms, Shellcode, Backdoor | 2.6 million | 49 |
T-ITS
The T-ITS dataset is a cyber-physical dataset specifically designed for UAV intrusion detection. Developed using a custom UAV testbed, it includes both physical telemetry (16 features) and cyber network flow data (37 features). Four types of attacks were executed: de-authentication DoS, replay, false data injection, and evil twin attacks. The data is provided in CSV format and annotated for supervised machine learning, making it an ideal candidate for anomaly-based IDS training (Hassler, Mughal & Ismail, 2023).
CICIDS-2017
CICIDS-2017 is a modern intrusion behaviors simulated over HTTP, FTP, SSH, and SMTP protocols. The dataset spans five days of real traffic between 25 user agents and includes attacks such as Brute Force, Web Injection, Infiltration, DDoS, and Heartbleed. Over 80 flow-level features are extracted per sample using CICFlowMeter. While not UAV-specific, the dataset offers realistic and rich attack scenarios suitable for deep learning-based anomaly detection (Chen, 2023).
UNSW-NB15
The UNSW-NB15 dataset provides a combination of real and synthetic traffic captured in a controlled lab using the IXIA PerfectStorm tool. It includes nine classes of attacks such as Worms, Shellcode, Exploits, Generic, and Reconnaissance. A total of 49 statistical features were engineered from packet flows using Argus and Bro-IDS. While not tailored to UAVs, the diversity of attack profiles adds value for evaluating generalization performance across non-UAV datasets (Chen, 2023).
CTGAN: Augmented CICIDS-2017
CTGAN-CICIDS-2017 is a class-balanced, synthetic variant of CICIDS-2017. Using Conditional Tabular GANs with WGAN-GP regularization, minority classes such as Heartbleed, Infiltration, and Web Attack were augmented to achieve balanced class representation. The resulting dataset helps in mitigating the overfitting problem caused by imbalanced data in standard IDS benchmarks and is well-suited for evaluating anomaly-based learning algorithms (Zeng & Nait-Abdesselam, 2024).
CTGAN: Augmented UNSW-NB15
It follows a similar augmentation strategy applied to UNSW-NB15. Underrepresented attack classes such as Worms, Shellcode, and Backdoor were synthetically boosted using the same CTGAN framework. This ensures balanced multi-class distribution, allowing robust training of detection models like RoboLSTM-IDS and improving sensitivity to stealthy or low-frequency attack patterns (Zeng & Nait-Abdesselam, 2024).
RoboLSTM-IDS framework
This section describes the methodology behind the proposed RoboLSTM-IDS framework, a deep anomaly-based intrusion detection designed for UAV network security. The framework integrates robust feature optimization with temporal learning to detect various cyber and cyber-physical attacks. The full pipeline is composed of these major phases that we discussed in detailed in Algorithm 1, and also highlighted earlier in Fig. 3, they are; dataset preparation, robust optimization-based feature engineering, temporal sequence construction, LSTM model training, and final evaluation.
| Require: Multi-source Datasets D = {D1, D2, ..., Dn}, Window size T |
| Ensure: Predicted labels and evaluation metrics: Accuracy, Fl, MCC, AUC |
| 1: Begin |
| 2: // Phase 1: Data Preprocessing |
| 3: for each dataset Di D do |
| 4: Perform Data Cleaning |
| 5: Apply Normalization (Min-Max or Z-score) |
| 6: Encode class labels (One-Hot or Label Encoding) |
| 7: end for |
| 8: // Phase 2: ROBOTa Feature Engineering |
| 9: Define fitness function f(F) based on classification accuracy and feature subset size |
| 10: Initialize population of feature subsets |
| 11: while termination condition not met do |
| 12: Generate candidate subsets Fi |
| 13: Evaluate f(Fi) using a lightweight classifier |
| 14: Update population based on fitness scores |
| 15: end while |
| 16: Select best subset F* = argmax f(Fi) |
| 17: // Phase 3: Temporal Sequence Construction |
| 18: Initialize empty sequence set S |
| 19: for each sample xi in D restricted to features F* do |
| 20: for do |
| 21: Create window wj = {xj, xj+1, ..., xj+T−i} |
| 22: Append wj to S |
| 23: end for |
| 24: end for |
| 25: for each Wj S do |
| 26: Assign label yj (e.g., majority class or last timestep) |
| 27: end for |
| 28: // Phase 4: LSTM-Based Classification |
| 29: Define input shape (T, | ) for LSTM |
| 30: Initialize LSTM gates (input, forget, output) as per Eqs. (4)–(8) |
| 31: Train model using training set and cross-entropy loss |
| 32: Apply Softmax for final classification |
| 33: // Phase 5: Performance Evaluation |
| 34: Test model on the testing set |
| 35: for each metric m {Accuracy, Fl, MCC, AUC} do |
| 36: Compute m using Eqs. (9)–(15) |
| 37: end for |
| 38: Return predicted labels and evaluation metrics |
| 39: End |
Preprocessing
The input to the RoboLSTM-IDS framework consists of five datasets: T-ITS, CICIDS-2017, UNSW-NB15, and two augmented versions generated via CTGAN. Each dataset is first preprocessed to ensure compatibility across the pipeline. Initially, all categorical labels are encoded numerically, and attack classes are unified to reduce fragmentation (e.g., all Web attacks are grouped together). Missing values, NaNs, and duplicates are eliminated during a data cleaning phase. Subsequently, feature values are normalized to a [0, 1] scale using Min-Max normalization to improve training stability. Finally, each dataset is split into training and testing subsets, typically in an 80:20 ratio. For temporal modeling, each sequence is constructed with a fixed sliding window of size T and stride . Given a tabular stream , we build sequences as [x_t, x_t+1, …, x_t+T−1].
Feature engineering with ROBOTa
The ROBOTa module Robust Optimization-Based Tabular Feature Engineering is one of the most important elements to the RoboLSTM-IDS framework. As a model enhancement technique, feature engineering is crucial in enhancing the level of accuracy, preventing overfitting, and generalization in heterogeneous UAV datasets. ROBOTa uses a strong, optimization-based approach in selecting a small, stable set of features without compromising model performance unlike traditional filtering or wrapper-based techniques. It is a population-based swarm heuristic technique that searches binary masks over features to minimize a bi-objective fitness: (i) classification loss (estimated quickly with a lightweight proxy classifier) and (ii) sparsity (number of selected features).
The feature selection task is formulated as a multi-objective optimization problem. The two conflicting objectives are (i) minimizing the classification error and (ii) minimizing the number of features. This trade-off is captured using a fitness function, which evaluates each candidate subset as shown in Table 4.
| Description | Formula | Eq. no. |
|---|---|---|
| Fitness function combining classification loss and sparsity | (1) | |
| Velocity update rule for candidate subsets | (2) | |
| Position update rule for candidate solutions | (3) |
Here, is the classification loss associated with feature subset , is the cardinality of the subset, and N is the total number of available features in the dataset. The scalar weights and control the emphasis placed on accuracy and sparsity, respectively. These hyperparameters can be tuned depending on the dataset characteristics (e.g., class imbalance, feature redundancy).
ROBOTa uses a population-based search strategy inspired by swarm intelligence heuristics. A population of P candidate feature subsets is initialized randomly. Each candidate has an associated velocity and position . During each generation , all candidates are evaluated using the fitness function (Eq. (1)), and their positions and velocities are updated using Eqs. (2) and (3).
In the velocity update equation, is a heuristic term to encourage exploration, and is an attraction coefficient that pulls the candidate toward the current global best solution . The position update allows the candidate subset to evolve by selecting or deselecting features in each iteration.
Throughout the optimization process, a lightweight classifier (e.g., logistic regression or shallow neural net) is used to estimate efficiently. To reduce stochastic variance, a subset’s fitness score may be averaged over multiple random seeds or k-fold splits.
As shown in Table 5, the hyperparameters of the proposed ROBOTa module were systematically tuned over predefined ranges. The search space included the population size (P), number of generations (G), exploration and attraction coefficients ( , ), and the trade-off weights for accuracy ( ) and sparsity ( ). For proxy classifiers, we experimented with both logistic regression (with L2 regularization, ) and a shallow MLP; however, logistic regression was selected in the final configuration. The reported values were chosen based on validation performance, and all experiments were averaged across multiple random seeds to ensure robustness.
| Parameter | Range/Value | Final (example) |
|---|---|---|
| Population P | {20, 40, 60} | 40 |
| Generations G | {40, 60, 80} | 60 |
| Exploration | {0.2, 0.5, 0.8} | 0.5 |
| Attraction | {0.5, 0.8} | 0.8 |
| Accuracy weight | {1.0, 0.5} | 1.0 |
| Sparsity weight | {0.05, 0.10, 0.20} | 0.10 |
| Proxy classifier | LR (L2, )/MLP(64) | LR (L2, ) |
| Seeds | {13, 17, 23} | Mean over seeds |
The algorithm continues for a fixed number of generations or until convergence criteria are met (e.g., no significant improvement in global best). Once complete, the most robust and compact feature subset is selected and forwarded to the temporal transformation and LSTM classification stages.
ROBOTa thus ensures that only the most predictive and stable features are retained, reducing dimensionality while improving learning efficiency, which is especially important in real-time UAV deployments with constrained edge hardware.
Temporal sequence transformation
After feature selection, the data is converted from a flat tabular format to a temporal sequence suitable for LSTM input. Using a fixed sliding window of size T, individual samples are organized into overlapping sequences. This allows the model to learn temporal dependencies that reflect how attack patterns evolve over time in UAV communications.
LSTM-based classification
The final stage of the RoboLSTM-IDS framework involves sequence learning using an LSTM network. LSTM is a type of recurrent neural network (RNN) designed to handle sequential data with long-term dependencies, which is particularly useful for detecting temporal patterns in UAV network traffic and behavioral logs. The model receives as input a time-series of optimized feature vectors generated through the ROBOTa module and transformed via temporal windowing.
An LSTM unit maintains two types of state at each time step : a hidden state and a memory cell . These are updated through three key gates: the forget gate, the input gate, and the output gate. Their operations are controlled by the following set of equations, summarized in Table 6.
| Description | Equation | Eq. no. |
|---|---|---|
| Forget gate activation | (4) | |
| Input gate activation | (5) | |
| Candidate cell state | (6) | |
| Cell state update | (7) | |
| Output gate and hidden state | (8) |
In these Eqs. (4) to (8), represents the input vector at time step , is the sigmoid activation function, and is the hyperbolic tangent function. The forget gate determines which information from the previous cell state to retain. The input gate controls which parts of the new candidate state are added to the memory. The output gate governs what part of the memory cell is exposed as the hidden state , which is then passed to subsequent layers or time steps. For classification, the final hidden state is passed through a dense softmax layer that maps to the intrusion classes. In the context of UAV-based anomaly detection, this allows the model to make predictions based not only on instantaneous patterns, but also on the temporal dynamics leading to an event, which is especially useful in detecting gradual or stealthy intrusions. So, the model receives input sequences of shape , which are processed through an LSTM layer with 64 units (returning sequences), followed by a dropout layer ( ). This is succeeded by a second LSTM layer with 32 units, another dropout layer ( ), and two dense layers: one with 64 units and ReLU activation, and the final output layer with C units and softmax activation (sigmoid for binary classification). Gradient clipping is applied at 1.0 to stabilize training. For optimization, we employ Adam with a learning rate of , a batch size of 128, and train for up to 100 epochs with early stopping (patience of 10, restoring the best weights). The loss function is a weighted cross-entropy, where class weights are derived from the training labels, and macro-F1 is monitored as the early-stopping criterion due to its robustness under class imbalance. Regularization is achieved via dropout (as described above), L2 weight decay ( ) on dense layers, and layer-normalized LSTMs (when available) in ablation studies. To analyze window size sensitivity, we evaluate , select the best configuration on the validation set, and report the chosen T for each dataset.
As summarized in Table 7, we provide the full details of the LSTM architecture and training configuration used across different datasets. The temporal window size (T) was set to 20 for most datasets, except for UNSW-NB15 where yielded better temporal context modeling. The architecture consists of a stacked LSTM with two recurrent layers (64 and 32 units), followed by a dense hidden layer of 64 neurons. Dropout regularization (0.3 applied to both recurrent and dense layers) was employed to mitigate overfitting. The Adam optimizer with a learning rate of was used for training. Each experiment was trained with a batch size of 128 over 100 epochs, ensuring stability across both real and synthetic datasets.
| Dataset | T | Hidden layers | Dropout | Optimizer (lr) | Batch/Epochs |
|---|---|---|---|---|---|
| T-ITS | 20 | LSTM(64) LSTM(32) Dense(64) | 0.3/0.3 | Adam ( ) | 128/100 |
| CICIDS-2017 | 20 | Same as above | 0.3/0.3 | Adam ( ) | 128/100 |
| UNSW-NB15 | 30 | Same as above | 0.3/0.3 | Adam ( ) | 128/100 |
| CTGAN-CICIDS | 20 | Same as above | 0.3/0.3 | Adam ( ) | 128/100 |
| CTGAN-UNSW | 20 | Same as above | 0.3/0.3 | Adam ( ) | 128/100 |
For each dataset, we run a nested search on the training split with a held-out validation fold. We tune ROBOTa parameters and LSTM parameters (hidden sizes, dropout, learning rate, batch size, window T) using random search (30 trials) bounded by the ranges in Tables 5 and 7. Final models are retrained on train+validation with the selected configuration and evaluated on the held-out test split. We repeat all experiments with three seeds and report mean performance.
Hence, the use of LSTM in RoboLSTM-IDS enables robust detection of both abrupt and evolving attack behaviors, making it well-suited for real-time UAV surveillance environments.
Evaluation metrics
The effectiveness of the RoboLSTM-IDS system is strictly tested by a set of measures which together measure the quality of the classification, the specificity of the recognition, the ability to withstand the imbalance of classes, and predictive consistency. The optimized feature set is then trained on and the expected outcomes of the model on the test set are evaluated in terms of standard statistical measures.
These performance measures are computed: Accuracy, Precision, Recall, F1-score, Matthews Correlation Coefficient (MCC), Cohen Kappa and the Area Under the Receiver Operating Characteristic Curve (AUC-ROC). These metrics have given a full-fledged performance profile; notably essential in analyzing security systems that are deployed in class-imbalanced UAV settings.
A brief description and the mathematical formulation for each metric are presented in Table 8. The equation numbers provided are used for in-text referencing throughout the results section.
| Metric | Formula | Eq. no. |
|---|---|---|
| Accuracy | (9) | |
| Precision | (10) | |
| Recall (Sensitivity) | (11) | |
| F1-score | (12) | |
| Matthews Correlation Coefficient (MCC) | (13) | |
| Cohen’s Kappa | (14) | |
| AUC (Area Under Curve) | (15) |
Where Accuracy (Eq. (9)) is an evaluation of the general accuracy of predictions on all classes. Precision (Eq. (10)) indicates how many of all positive predictions are true, where the focus is on the amount of the model not to raise the false alarms. Recall (Eq. (11)) or sensitivity measures the capacity of detecting the actual attack instances without exclusion. The F1-score (Eq. (12)) provides an ideal balance between Precision and Recall, which is especially significant in cases of class imbalance. MCC (Eq. (13)) analyzes the quality of binary classifications in a correlation way, which provides a strong score even with skewed data. The Kappa of Cohen (Eq. (14)) adaptive adjustment of the accuracy takes into consideration the agreement through chance. Finally, the discriminative power of the model over different thresholds, AUC (Eq. (15)) is a measurement of the effectiveness of the model at differentiating between an attack and a benign case.
Simulation environment
This part presents a complete simulation set-up on the proposed RoboLSTM-IDS framework. It includes the description of the tools, libraries, and hardware applied to develop, apply, and test the deep anomaly-based IDS in five separate datasets. The modular pipeline simulation was implemented in Python with Jupyter Notebooks with the help of machine learning and deep learning packages such as TensorFlow and Scikit-learn. Random seeds were fixed to ensure reproducibility of every simulation and no training/test overlap was allowed. The averaged value of all results presented in this research is the mean of three independent runs to reduce the variance due to stochastic factors in optimization and training of LSTMs.
All experiments were conducted on a high-performance computing system using the specifications and software libraries summarized in Table 9. The setup ensured fast training, real-time monitoring, and robust reproducibility for all experimental trials.
| Category | Specifications/Details |
|---|---|
| Hardware | |
| Processor | Intel Core i7 (12th Gen, multi-thread) |
| RAM | 32 GB DDR4 |
| GPU | NVIDIA RTX 3080 (16 GB) |
| Storage | 500 GB SSD |
| Software and tools | |
| Operating system | Ubuntu 22.04 LTS (64-bit) |
| Machine learning libraries | TensorFlow 2.12, PyTorch 2.1, Scikit-learn |
| Programming language | Python 3.9 |
| Dataset preprocessing | Pandas, NumPy, SciPy |
| Evaluation metrics | Accuracy, Precision, Recall, F1-score, AUC-ROC, MCC, Kappa |
| Hyperparameter tuning | Optuna, Grid Search, Random Search |
| Visualization tools | Matplotlib, Seaborn, TensorBoard |
| Network traffic analysis tool | Wireshark |
| Dataset type | Mixed Cyber + Cyber-Physical (T-ITS, CTGAN, CICIDS-2017, UNSW-NB15) |
| Version control | Git |
Results and discussions
This section provides a thoroughly analysis of the proposed framework on the basis of a variety of benchmark datasets.
The efficiency of the framework is strictly assessed with the help of various measures which are combined to measure classification accuracy, detection precision, resistance to class imbalance, and predictive reliability. The optimized feature set is then trained on and the prediction on the test set by the model are evaluated by standard statistical measures.
The metrics produce a complete performance portrait, which is vital in the analysis of security systems in the class-imbalanced UAV environment. Table 8 briefly describes them and gives the mathematical formulation of each metric. Throughout the results section, in-text referencing is done with the numbers that are given in the equation.
The details of the T-ITS dataset show that the proposed RoboLSTM-IDS framework was able to achieve the best classification on it, and this information is stated in Table 10. This dataset is the only one that incorporates both physical telemetry capabilities (e.g., altitude, pitch, battery voltage, GPS) and network-layer data (e.g., packet size, port flow, and frequency), thus it is easy to use in deep-anomaly detection models that demand contextual temporal correlations. As depicted in the table, RoboLSTM-IDS attains an almost perfect accuracy of 99.62 on T-ITS because of the more elaborate cyber-physical data scenario and its best result is 99.62 percent in all datasets. Interestingly, the model is also characterized by high classification stability, as the model results in over 99.0% accuracy even on synthetic and GAN-enhanced datasets, including CTGAN-CICIDS and CTGAN-UNSW.
| Metric | T-ITS | CICIDS-2017 | UNSW-NB15 | CTGAN-CICIDS | CTGAN-UNSW |
|---|---|---|---|---|---|
| Accuracy | 99.62% | 98.97% | 98.85% | 99.02% | 98.91% |
| Precision | 0.996 | 0.989 | 0.985 | 0.991 | 0.988 |
| Recall | 0.994 | 0.987 | 0.981 | 0.988 | 0.983 |
| F1-score | 0.995 | 0.988 | 0.983 | 0.989 | 0.985 |
| Matthews correlation coefficient | 0.981 | 0.964 | 0.959 | 0.972 | 0.961 |
| Cohen’s Kappa | 0.975 | 0.961 | 0.952 | 0.967 | 0.958 |
| AUC-ROC | 0.997 | 0.985 | 0.981 | 0.989 | 0.983 |
The metrics including Precision, Recall and F1-score are all above 0.98 in all datasets which indicates that RoboLSTM-IDS does not only predict correctly, but also with balanced trade-off between false positive and false negative. This is particularly important in UAV intrusion detection systems, where either of the two kinds of errors can cause a mission failure or security compromise. The MCC and Cohen Kappa value also support this argument. Such statistics take into consideration the imbalance of classes and agreement that is more than chance, and the observation that all MCC values are above 0.95, and Cohen Kappa values are above 0.94 in all datasets, proves high-quality classification consistency. Finally, AUC-ROC scores are between 0.983 and 0.997, which shows that the model is very special to have the capacity to differentiate normal and attack classes in a large range of decision thresholds. These large AUCs indicate the appropriateness of the model to operate in dynamic UAV settings, where patterns of the attacks can either be non-stationary or sparsely occupied.
During training, RoboLSTM-IDS exhibited rapid and stable convergence across all datasets. On the T-ITS dataset, the model converged within 9–11 epochs, achieving 99.3% training and 99.62% validation accuracy, with smooth, exponentially decaying loss curves and minimal overfitting. For CICIDS-2017 and UNSW-NB15, convergence occurred around 10–14 epochs, with validation accuracies of 98.95% and 98.85%, respectively. The CTGAN-augmented datasets also demonstrated consistent learning stability, converging within 11–13 epochs and reaching over 99% validation accuracy.
As illustrated in Table 11, it provides a quantitative breakdown of the True Positive Rate (TPR), False Positive Rate (FPR), and False Negative Rate (FNR) for the RoboLSTM-IDS framework across all five benchmark datasets. The model attained TPR value of more than 94% and this implies that a significant number of attack cases were properly identified. It is noteworthy, that T-ITS data set provided the best TPR of 96.4% due to its powerful cyber-physical feature combination. In all sets, FPR and FNR were low, below 3.5 percent, indicating the great capacity of the model to reduce false alarms and those that were not detected. These findings support the usefulness of RoboLSTM-IDS in differentiating legitimate UAV activity and different types of intrusion, and therefore its practical potential to be used in real-time UAV settings.
| Dataset | True positive rate (TPR) | False positive rate (FPR) | False negative rate (FNR) |
|---|---|---|---|
| T-ITS | 96.4% | 2.5% | 1.1% |
| CICIDS-2017 | 95.2% | 3.1% | 1.7% |
| UNSW-NB15 | 94.7% | 3.5% | 1.8% |
| CTGAN-CICIDS | 95.9% | 2.8% | 1.3% |
| CTGAN-UNSW | 94.8% | 3.4% | 1.8% |
Figure 4 illustrates that all the five datasets exhibit smooth convergence and homogeneous learning patterns in terms of training and validation accuracy and loss patterns. The validation accuracy in each data set is equal or slightly higher than the training accuracy, which proves that there is no overfitting and good generalization. In particular, the T-ITS dataset has the best value of 99.6 percent validation accuracy, then 99.2 percent on CTGAN-CICIDS, 99.0 percent on CTGAN-UNSW and 98.9 percent on UNSW-NB15. The loss curves in all circumstances are gradually decreasing and leveling off below 0.05, which is a validation of the efficient optimization.
Figure 4: Accuracy and loss over epochs on all datasets.
Also in Fig. 5 presents the confusion matrices for the five evaluated datasets, it demonstrate the classification precision of RoboLSTM-IDS across diverse UAV-relevant intrusion scenarios. The T-ITS matrix shows near-perfect classification across all five classes, with minimal confusion observed only between Replay and FDI attacks. Notably, the model achieved flawless detection of Evil Twin attacks and over 99% accuracy for Normal, DoS, and FDI classes, highlighting its effectiveness on cyber-physical UAV data. On CICIDS-2017, the model effectively distinguishes Brute Force, Web, and Infiltration attacks, with only minor misclassifications between Brute Force and Web classes, achieving overall high diagonal dominance. Similarly, the UNSW-NB15 matrix reveals strong detection rates for Exploits and Reconnaissance classes, while minor confusion is seen between Shellcode and Worms, likely due to feature overlap in synthetic data. The CTGAN-augmented datasets reflect excellent learning generalization, with sharply defined diagonals and minimal false positives. CTGAN-CICIDS shows precise recognition of low-frequency attacks like Botnet and Heartbleed, while CTGAN-UNSW achieves high clarity in separating Shellcode, Worms, and Backdoor. Overall, the confusion matrices across all datasets confirm RoboLSTM-IDS’s robust per-class discrimination capabilities and strong adaptability to varied attack structures, class distributions, and domain representations. Also our ablation analysis confirmed the importance of ROBOTa are removing the feature optimization module reduced macro-F1 by 4–6% across all datasets, demonstrating that the gains of RoboLSTM-IDS are not solely due to the LSTM architecture.
Figure 5: Confusion matrices of RoboLSTM-IDS on benchmark datasets.
To further assess the predictive robustness of the RoboLSTM-IDS framework, we evaluated the Root Mean Squared Error (RMSE) across all five benchmark datasets. RMSE serves as an important measure of the deviation between predicted and actual values, particularly useful in understanding the residual error in probabilistic and sequence-based classifications. As shown in Fig. 6, the proposed model maintains exceptionally low RMSE values across all datasets, ranging from 0.021 on T-ITS to 0.031 on UNSW-NB15. The lowest RMSE on the T-ITS dataset reflects the model’s ability to make highly precise predictions when fed with rich cyber-physical telemetry and network flow data. Even on complex synthetic and augmented datasets such as CTGAN-CICIDS and CTGAN-UNSW, the RMSE remains below 0.030, demonstrating consistent generalization and low error variance.
Figure 6: Root mean squared error across datasets.
Execution time is another critical parameter for UAV networks, Fig. 7 presents the execution time in milliseconds for RoboLSTM-IDS when tested on T-ITS, CICIDS-2017, UNSW-NB15, CTGAN-CICIDS, and CTGAN-UNSW datasets. The T-ITS dataset recorded the lowest execution time at 120 ms, while the UNSW-NB15 dataset required the highest time at 145 ms. These slight variations are attributed to differences in dataset size, feature dimensionality, and class distribution. Overall, the model consistently achieves near good execution performance, validating its suitability for deployment in time-sensitive UAV applications.
Figure 7: Execution time of proposed model across datasets.
As shown in Table 12, CNN-LSTM achieves slightly higher recall, but requires three convolutional layers, two LSTM layers, and extended training (150 epochs), resulting in 8 higher latency (95 ms vs. 12 ms). In contrast, RoboLSTM-IDS achieves the highest accuracy (99.1%) and F1-score (99.0%) while incurring the lowest inference latency (12 ms) and the smallest model size (0.5 M parameters) among all compared deep learning–based IDS models and real-time feasibility for UAVs.
| Model | Acc (%) | F1 (%) | Latency (ms) | Params (M) | Configuration |
|---|---|---|---|---|---|
| CNN-LSTM | 98.5 | 98.2 | 95 | 3.2 | 3 conv + 2 LSTM layers, 150 epochs |
| GRU | 97.8 | 97.4 | 72 | 2.1 | 2 GRU layers, 120 epochs |
| Autoencoder | 96.9 | 96.0 | 65 | 1.5 | 4 hidden layers, 100 epochs |
| RoboLSTM-IDS | 99.1 | 99.0 | 12 | 0.5 | 2 LSTM layers + ROBOTa, 100 epochs |
Furthermore, in terms of lightweight deployment, Table 13 shows that RoboLSTM-IDS requires only 12 features compared to 40 at the beginning of the study. Its model size is reduced by nearly 6 , while inference latency decreases from 80 to 12 ms per sample, well within UAV real-time processing constraints. RAM consumption drops from 1,200 MB to 400 MB, and the model includes support for energy-efficient operation. Unlike the baseline IDS, RoboLSTM-IDS is making it a practical candidate for onboard UAV intrusion detection.
| Metric | Baseline model | Proposed model |
|---|---|---|
| Number of features | 40 | 12 |
| Model size (MB) | 20 | 3.5 |
| Inference time (ms/sample) | 80 | 12 |
| RAM usage (MB) | 1,200 | 400 |
| Energy mode support | No | Yes |
| Deployment readiness | Not suitable | Yes |
Although certain deep learning baselines such as CNN-LSTM (in our case) achieve marginally higher recall, they are not practically suitable for UAV environments. For example, CNN-LSTM incurs an inference latency of 95 ms per sample, which exceeds the UAV’s operational threshold of 50 ms. In contrast, RoboLSTM-IDS achieves a significantly lower latency of 12 ms per sample, well within real-time onboard processing limits. This highlights that while baseline models may appear competitive in terms of detection performance, their computational overhead renders them unsuitable, whereas RoboLSTM-IDS strikes an effective balance between accuracy and efficiency.
To further validate the effectiveness of the proposed framework, its performance was benchmarked against a comprehensive set of baseline ML and DL classifiers. Table 14 presents the comparative evaluation of RoboLSTM-IDS against few top baseline models, covering both classical machine learning algorithms and deep learning architectures across five UAV-relevant datasets. Among traditional models, Random Forest and Gradient Boosting showed relatively stronger accuracy, benefiting from ensemble learning’s ability to reduce overfitting and variance. Decision Tree and SVM followed closely, offering decent interpretability but limited performance in capturing non-linear temporal dependencies typical of UAV intrusion patterns. Meanwhile, simpler classifiers like Naïve Bayes and K-Nearest Neighbors, though computationally efficient, consistently lagged in accuracy due to their static nature and sensitivity to noisy, high-dimensional data. Deep learning models, including DNN and CNN, performed better than classical approaches, reaching above 97% accuracy across datasets. Their advantage stemmed from deeper abstraction and improved generalization. However, these models lacked sequential memory and could not capture long-term dependencies within UAV telemetry and communication flow. This limitation resulted in slightly lower recall and F1-scores compared to RoboLSTM-IDS, especially on complex attack patterns and minority classes.
| Model | T-ITS | CICIDS-2017 | UNSW-NB15 | CTGAN-CICIDS | CTGAN-UNSW |
|---|---|---|---|---|---|
| Naïve Bayes | 95.10% | 95.00% | 95.20% | 95.50% | 95.30% |
| K-nearest neighbors | 96.20% | 95.60% | 95.80% | 96.00% | 95.90% |
| Support vector machine | 96.40% | 95.90% | 96.10% | 96.30% | 96.20% |
| Decision tree | 96.90% | 96.30% | 96.40% | 96.60% | 96.50% |
| Random forest | 97.10% | 96.70% | 96.80% | 97.00% | 96.90% |
| Gradient boosting | 97.40% | 96.90% | 97.00% | 97.30% | 97.10% |
| Deep neural network | 97.90% | 97.30% | 97.50% | 97.70% | 97.60% |
| Convolutional neural network | 98.20% | 97.60% | 97.70% | 98.00% | 97.80% |
| Proposed model | 99.62% | 98.97% | 98.85% | 99.02% | 98.91% |
RoboLSTM-IDS, on the other hand, consistently achieved the highest performance across all datasets as evaluated in Table 14. By combining optimized feature selection (via ROBOTa) with temporal modeling (LSTM), it maintained superior accuracy (99%), high recall, and balanced precision. Its ability to model evolving patterns in cyber-physical data enabled better detection of low-frequency or stealthy attacks, while minimizing both false positives and false negatives. The AUC-ROC score of 0.997 on the T-ITS dataset and 0.98 on all other datasets further validates its exceptional discriminative power across thresholds, making it ideal for real-time, edge-based UAV intrusion detection scenarios. Also, paired t-tests across datasets confirmed that RoboLSTM-IDS performance improvements over baselines are statistically significant (p = 0.05). We also report the mean and standard deviation (e.g., Accuracy = 98.7 0.3%).
The outcomes in this section can evidently prove the merits of the proposed framework delivers consistently superior performance across all evaluated datasets. By leveraging a robust feature selection pipeline and sequence-aware LSTM classification, the model achieves high accuracy, low false alarm rates, and reliable detection of both common and rare attack types. The framework demonstrates strong generalization capabilities by supporting data distributions that include UAV real telemetry information as well as synthetic created data and combined data sets. The model displays superiority over its classical and deep learning benchmarks when evaluating precision and execution speed based on expert evaluations. Multiple tests confirm RoboLSTM-IDS has demonstrated its readiness for large-scale UAV intrusion detection programs in operational settings.
Despite the promising results, this work has certain limitations. First, while RoboLSTM-IDS is lightweight at inference, the ROBOTa training phase can be computationally intensive on very large datasets. Second, robustness against adversarial threats and jamming remains unexplored. And, further energy benchmarking on embedded UAV processors is required for real deployment.
Although proposed model is designed for UAV network intrusion detection, its methodological insights extend to other mission-critical domains. For instance, deep learning models have been widely applied in medical diagnostics and treatment (Ogab et al., 2025), while machine learning techniques have been used for drug discovery and pandemic response (Chilakalapudi & Jayachandran, 2025). In these domains, as in UAV security, the need for reliable, lightweight, and explainable models is paramount. Similarly, large language models have demonstrated powerful reasoning capabilities but also face challenges related to interpretability, bias, and resource demands in healthcare applications (El-Shorbagy et al., 2025). Recent work on generative AI for diagnostic supports, such as Iftikhar, Rashid & Attaullah (2025), Ogab et al. (2025), also highlights the importance of balancing predictive accuracy with computational feasibility. By comparison, RoboLSTM-IDS emphasizes robustness, efficiency, and cross-dataset generalization qualities that are equally critical for safe deployment of AI in both cybersecurity and healthcare settings.
Conclusion and future work
In this study, we introduced RoboLSTM-IDS, a robust and efficient anomaly-based Intrusion Detection System tailored for UAV networks. The proposed framework integrates a novel feature engineering strategy, ROBOTa, with an LSTM-based deep learning classifier to leverage both spatial and temporal correlations in UAV telemetry and network flow data. The methodology was comprehensively evaluated across five benchmark datasets, including real-world and GAN-augmented scenarios such as CICIDS-2017, UNSW-NB15, CTGAN-CICIDS, and CTGAN-UNSW and real UAV scenarios T-ITS.
RoboLSTM-IDS proved to be the best model for IDS through its high accuracy ratings and recall performance alongside F1-score and AUC efficacy by surpassing traditional ML and DL baselines across the board. The model reached a maximum performance level of 99.62% on T-ITS dataset which demonstrates its competent threat detection capabilities for UAV cyber-physical systems. The model performance confirmed its high classification reliability through MCC and Cohen’s Kappa metrics while displaying balanced prediction and low residual error as shown by RMSE measurements. In terms of lightweight deployment, the proposed model reduces model size by nearly 6 , lowers inference latency from 80 to 12 ms, and cuts RAM usage by two-thirds compared to the baseline. With energy-efficient operation and deployment readiness, it offers a practical solution for advance UAV intrusion detection. The model demonstrates execution latency that is low enough for deployment across UAV platforms which have limited resources.
This research generated promising findings yet multiple new investigation perspectives emerged ahead. The current centralization of the model would benefit from Federated Learning integration because this would increase security and scalability when dealing with UAV fleets and energy benchmarking on embedded UAV processors is required. Future investigators should examine minimized versions of RoboLSTM for deployment systems with limited computational capabilities. Utilizing actual datasets of multiple UAV systems connected with adversarial elements would provide better assessment of defense capabilities against orchestrated stealthy threats. Data safety-critical applications can benefit from inclusion of explainability methods including SHAP or LIME for enhancing transparent model performance.
The foundation built by RoboLSTM-IDS allows secure development of intelligent IDS systems for modern UAV networks which show great promise for real-time defense deployment in future aerial networks.






