Abstract
With the development of the low-altitude economy, Unmanned Aerial Vehicles (UAVs) have been widely used in traffic monitoring, industrial inspection, and other fields. However, images acquired by UAVs often suffer from problems such as small object scale, large attitude variations, complex backgrounds, and severe occlusion, which pose serious challenges to the accuracy of existing object detection algorithms. To address these issues, this paper proposes a small object detection model for UAVs based on spatial and frequency domain collaborative enhancement (SFC-DETR). This model introduces a frequency-augmented polarity-aware attention module (FAPAM) that combines polarity-aware attention with adaptive frequency domain enhancement. Through the polarity-aware attention mechanism, it can effectively capture global contextual relationships while maintaining linear computational complexity. Then, an adaptive window frequency domain modulation module is used to refine the features, thereby enhancing the representation of key details. Additionally, this paper also designs a dual-domain adaptive feature fusion module (DD-AFFM) that achieves adaptive fusion of cross-layer features in the spatial and frequency domains, fully preserving fine-grained structural information and high-level semantic features. Experimental results on the public VisDrone and TinyPerson datasets demonstrate that our proposed method outperforms state-of-the-art models in small object detection tasks. Project code:
https://github.com/wxb1234/SFC-DETR.