TurkSentGraphExp: an inherent graph aware explainability framework from pre-trained LLM for Turkish sentiment analysis

View article
PeerJ Computer Science

Main article text

 

Introduction

  • We propose TurkSentGraphExp, a novel framework for explainable Turkish sentiment analysis that captures both root and suffix-level information using context-aware embeddings from a pre-trained language model.

  • TurkSentGraphExp constructs a novel graph representation using a pre-trained language model, avoiding traditional token-based methods and enabling structured, relational text representations.

  • By leveraging attention based graph representation learning (GRL) architectures, TurkSentGraphExp inherently provides phrase-level explainability, marking the first study to achieve this for Turkish texts.

  • Experiments on real-world datasets show TurkSentGraphExp outperforms state-of-the-art methods, achieving up to 40% higher fidelity in explainability and improving classification accuracy by margins of 0.53 over the second-best counterpart architectures.

Literature review

Explainability approaches for sentiment classification task in Turkish NLP domain

Graph-based explainability approaches in NLP domain

Problem statement

where i=1,2,,n. Each Gi has a sentiment label yi, where yi{1,2,,c}. The goal is to learn a function f(Gi)yi that maps the graph representation Gi to its sentiment class yi. This can be done by using a graph-based ML model which can extract both the contextual and structural information from the graphs to perform sentiment classification in an inductive way. In short, the inductive approach for the task consists of creating a graph for each text and learning a function to classify these graphs into c sentiment classes, enabling sentiment analysis on new texts by using graph-based models.

where Gi is the graph representation of a text, w(E(Gi)) represents the importance scores of the edges in Gi, and yi{1,2,,c} is the sentiment label assigned to Gi. This approach enhances explainability by identifying and leveraging the most significant phrases (represented by connected nodes and their associated edges) in the sentiment classification process.

Proposed approach

Graph construction module

Context-aware word embeddings

Handling of suffixes

Edge construction

Attentive classification module

Formulation of problem: inductive vs transductive

where i=1,2,,n. Each Gi is associated with a sentiment label yi, where yiC.

Attention-based GNN design for sentiment classification

 αij=exp(eij)kN(i)exp(eik)
 hi=σ(jN(i)αijhj)
 hG=Pooling(h1,,hV).

Explainability module

Experimental studies

  • RQ1: What is the capacity of a pre-trained language model to generate embedding vectors at a layer-by-layer level, and which layers exhibit higher levels of semantic discrimination?

  • RQ2: What is the dependency coverage rate for graph representation in Turkish texts using attention weights of a pre-trained language model, specifically targeting the challenges highlighted in “Handling of suffixes”?

  • RQ3: How does the representation model derived from the language model affect classification performance, and what are the performance disparities between attention-based and non-attention-based GNN models?

  • RQ4: To what extent is the most successful model explainable, considering both qualitative and quantitative aspects of its explainabilities?

  • (i) Layer-wise embedding evaluation: This experiment involved visualizing the embedding vectors generated at a layer-by-layer level by a pre-trained Turkish language model. It aimed to investigate the capacity of the model to capture semantic information across different layers, in line with RQ1.

  • (ii) Dependency graph accuracy: We measured the accuracy of dependency relationships using labeled data, as mentioned in RQ2. The goal was to assess how well the model can identify and represent dependencies within the text.

  • (iii) Classification performance comparison: This experiment focused on comparing the performance of common graph-based learning models using graph representations derived from Turkish sentiment benchmark datasets. It aimed to address RQ3 by evaluating the effectiveness of attention mechanisms in improving classification performance.

  • (iv) Evaluation of explainability: In response to RQ4, we evaluate the explainability of the model using two common metrics as known as fidelity and sparsity. The objective is to assess how well the model’s predictions could be interpreted and explained in both qualitative and quantitative terms.

Benchmark datasets

  • TripAdvisor: The dataset comprises 42,000 hotel reviews from the TripAdvisor web page (Büyükeke, Sökmen & Gencer, 2020), with sentiment labels in three categories. However, it is an unbalanced dataset, with 80% of the reviews being positive. To address this issue, we randomly selected 1,250 samples with an equal number of reviews in each class for this study.

  • ImdbFilmReview: Film reviews from IMDB (Amasyalı et al., 2012), which contain three traditional sentiment categories: positive, negative and neutral.

  • BlogPosts: Users’ sentiments from blog pages (Amasyalı et al., 2012).

Comparison models

  • GCN: In GCNs (Kipf & Welling, 2016), a specialized model architecture is employed for tasks involving graph data, such as node classification and graph-level prediction. Unlike traditional neural networks, GCNs operate directly on graph structures, leveraging node features and their relationships. This approach allows GCNs to capture complex dependencies and patterns within graphs, making them effective for tasks like social network analysis, molecular structure prediction, and recommendation systems.

  • GIN: Unlike GCNs that operate on fixed graph structures, GINs (Xu et al., 2018) dynamically consider different graph isomorphisms, enabling a more nuanced analysis of graph properties. This dynamic approach allows GINs to excel in tasks requiring precise graph matching and classification, such as molecular structure comparison, subgraph detection, and graph similarity assessment.

Evaluation metrics

where n is the number of classes.

Implementation details

Results

Layer-wise embedding evaluation

Dependency graph accuracy

Classification performance comparison

Evaluation of explainability

  • Otel (The hotel) konforlu (comfortable), ancak (but) hizmet (service) yetersiz (insufficient).

  • Oda (The room) geniş (spacious), fakat (yet) temizlik (cleanliness) sorunlu (problematic).

  • Personel (Staff) yardımsever (helpful), lakin (however) yemekler (meals) lezzetsiz(tasteless).

Conclusion

Additional Information and Declarations

Competing Interests

Author Contributions

Data Availability

Funding

The authors received no funding for this work.