Hybrid regression–classification framework for minimizing misclassification in green tea quality assessment using electronic nose
Abstract
Accurate prediction in tea quality assessment is vital for ensuring consistency between production standards and market expectations. Even minor misclassifications can lead to economic losses, over- or under-grading of tea products, and reduced consumer confidence. Traditional classification models often struggle to capture subtle differences in aroma profiles obtained from electronic nose (e-nose) sensors, resulting in inconsistent prediction accuracy across datasets. This study proposes a hybrid regression–classification framework to minimize misclassification and improve balanced accuracy in green tea quality prediction. The approach integrates ensemble tree-based classifiers with regression-based classification through probability-guided decision integration. The e-nose system captures volatile compound signals, which are processed by a regression model to produce continuous prediction scores. These scores are then converted into categorical outputs via threshold-based post-processing. Simultaneously, probabilistic outputs from the classification model are analyzed to determine a reference probability threshold p_Ref that guides the integration of regression-based predictions. Experiments conducted on two datasets (2024 and 2025) demonstrate that the proposed method effectively reduces incorrect predictions while improving model stability. The hybrid approach achieved balanced accuracy values of 99% on the 2024 dataset and 95% on the 2025 dataset, outperforming individual classification or regression models. The findings confirm that combining regression-based and probabilistic classification improves prediction reliability for e-nose-based tea quality assessment. The proposed hybrid framework provides a robust and interpretable solution for developing intelligent decision-support systems in the tea industry and other sensor-based quality monitoring applications.