Modeling the impact of dataset size and class imbalance on YOLOv10-based PPE detection systems
Abstract
This paper investigates the influence of dataset size and class imbalance on the performance of YOLOv10-based personal protective equipment (PPE) detection systems. Six YOLOv10 configurations (n, s, m, b, l, x) were tested on domain-specific datasets covering construction, industrial, and medical contexts. A novel mathematical model is introduced to describe the nonlinear relationship between dataset size and detection performance (mAP@50), revealing a saturation threshold beyond which additional data yield diminishing returns (R² = 0.968). The analysis also highlights that the superior accuracy of complex YOLOv10-x models significantly declines under conditions of pronounced class imbalance. These findings underscore the importance of balanced and sufficiently large datasets in optimizing detection accuracy for real-world safety applications.