CLIP-guided anomaly detection for power line inspection with multi-scale attention
Abstract
The inspection of power line infrastructure is essential for maintaining the safety and reliability of electrical grids, yet traditional manual ground patrols and helicopter-based surveys remain labor-intensive, costly, and often ineffective at detecting subtle defects. Recent advances in unmanned aerial vehicles and artificial intelligence have improved automation, but many existing approaches are limited to identifying asset types rather than detecting defects across multiple components. This study introduces a novel anomaly detection framework that enhances power line inspection by integrating a pre-trained CLIP image encoder with efficient channel attention modules and a normalizing flow-based density estimator. The CLIP image encoder extracts robust visual features without fine-tuning, while attention modules recalibrate multi-scale features to emphasize semantically salient regions. The normalizing flow module then models the distribution of normal features to detect anomalies via likelihood estimation. Evaluated on two challenging real-world datasets, InsPLAD and PTL-AI, the proposed method achieved state-of-the-art performance, with average AUC scores of 0.9566 and 0.8554, respectively, outperforming existing methods across various asset categories and defect types. The results demonstrate the efficacy and generalizability of the approach in complex inspection scenarios. The framework offers a scalable, annotation-efficient solution for real-world applications, with potential for extension to video-based monitoring and other infrastructure inspection domains.