A study on feature selection method integrating technological network features and stability scoring for high-value patent identification
Abstract
High-value patents represent critical assets for assessing technological competitiveness and steering industrial advancement, as they embody both fundamental innovations and market potential. However, existing approaches to high-value patent identification largely rely on structured or textual features, which present notable limitations in feature construction and text representation. On the one hand, feature redundancy and weak correlations undermine the interpretability of selected features. On the other hand, most textual features remain at a superficial level, lacking in-depth exploration of inter-patent technological relationships and network structures, thereby constraining identification performance. To address these issues, we propose a feature selection method that integrates technological network features with stability scoring for high-value patent identification. First, technical phrases are extracted from patent texts to construct semantic–co-occurrence networks and topic-clustered networks. Diverse edge-weighting strategies are designed for these networks to quantify inter-patent associations and capture domain-specific structural characteristics. Second, to tackle feature redundancy, we propose a feature selection method based on the stability scoring of random forest feature importance rankings. By repeatedly resampling the data, the fluctuations of feature rankings are statistically analyzed and transformed into stability scores. Building upon this, the Sequential Forward Floating Selection(SFFS) algorithm is employed to identify key features that effectively characterize high-value patents, thereby enhancing interpretability. Experiments conducted on UCI datasets demonstrate that, compared with traditional stability scoring, random forest ranking, and related methods, the proposed approach achieves superior performance in classification tasks. Finally, the proposed method is applied to an empirical study on high-value patent identification. The results demonstrate that integrating network features with structured features and stability-score-based feature selection not only enhances the performance of high-value patent identification but also further validates the importance of network features in the interpretability analysis of the selected features. In conclusion, the proposed method enhances the accuracy of high-value patent identification while providing new perspectives for understanding the technological core and innovative contributions of patents, thereby offering strong support for technology innovation management and industrial decision-making.