Explaining Explanations: Interpretability Methods for Discourse Analysis of Transformer Attention Maps
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
While LLMs have achieved state-of-the-art performance in NLP, their opacity hinders a human understanding of their predictions. Standard explainability techniques often prioritize technical faithfulness over linguistic plausibility. This paper argues for an interdisciplinary approach that integrates discourse analysis to critically interpret model explanations. We conduct a case study using CamemBERT, fine-tuned to classify French journalistic texts as news or opinion. We employ Layer-wise Relevance Propagation to generate attention maps for 1,000 test articles and analyze the token-level relevance scores through both in-depth qualitative analysis and a quantitative ranking of high-attention tokens. Our findings reveal that CamemBERT successfully captures genre-specific linguistic markers: it attends to cues of reported speech and temporal anchors in news, and to expressive punctuation, evaluative adjectives, and first-person pronouns in opinion. The discourse-analytic lens moves us beyond superficial observations, demonstrating how the model interprets features like punctuation as structural or stylistic conventions. We argue that integrating linguistic expertise into the explainability pipeline yields more nuanced, human-readable explanations.