Abstract
Accurate and real-time surgical tool detection is essential in Minimally Invasive Surgery (MIS) to ensure safe and effective Computer-Assisted Interventions (CAI) and Robot-Assisted Surgery (RAS). However, achieving high accuracy under the strict computational limits of embedded surgical platforms remains challenging. This paper introduces a set of enhanced lightweight YOLOv8 variants specifically tailored for surgical tool localization in real-time. The proposed architectures integrate several key innovations: Ghost Convolution for efficient feature extraction, a C2f-Ghost module for compact representation, the SC3T module combining Transformer blocks with Spatial Pyramid Pooling, and attention mechanisms including the Context Augmentation Module (CAM) and Convolutional Block Attention Module (CBAM). Furthermore, the Scylla-IoU (SIoU) loss is employed to improve bounding box regression for elongated instruments. Evaluations on the public m2cai16-tool-locations dataset show that the best variant attains 95.7%
[email protected] while reducing parameters and GFLOPs by up to 3× and 1.8×, respectively, compared to the YOLOv8 baseline. These results demonstrate that our design achieves state-of-the-art accuracy with substantially lower complexity, enabling practical deployment in resource-constrained, real-time surgical systems.