An interpretable heterogeneous graph attention network for high-value patent identification
Abstract
High-value patent identification is critical to technological innovation and industrial competitiveness, yet traditional approaches struggle to cope with the complex semantics of patent texts and multi-source heterogeneous data. This paper proposes an interpretable high-value patent identification method based on a heterogeneous graph attention network (HGATv2). The model incorporates a relation-aware multi-head attention mechanism, trainable delta word embeddings, and a multi-scale convolutional encoder to enhance the representation of deep semantic and structural information in patent documents. By constructing a heterogeneous graph with multiple node types and semantic relations, HGATv2 jointly models local textual context and cross-document global dependencies. In addition, a node feature fusion strategy and neighbour-sampling-based normalisation are introduced to alleviate semantic sparsity and structural heterogeneity. Experiments on both a public patent dataset and a private lithography patent dataset demonstrate that the proposed method significantly outperforms strong baselines, achieving an F1 score of up to 94.90%. Furthermore, a word-level interpretability analysis is conducted by combining TF–TF-IDF-based pre-filtering with an in-graph knock-out procedure, providing clear technical-term explanations for high-value patent decisions and improving model transparency. The results indicate that HGATv2 not only performs strongly in the lithography domain but also exhibits promising cross-domain adaptability and can serve as a core component of intelligent patent screening and innovation resource optimisation systems in real-world applications.