Where Is Politeness in Japanese BERT? A Layerwise Probing and CLS Activation Patching Study

Proceedings of the 1st Workshop on Social Context (SoCon) and the 2nd Workshop on Integrating NLP and Psychology to Study Social Interactions (NLPSI) @ LREC 2026

DOI:10.63317/4vex4yooz83k

Abstract

Politeness is a central pragmatic dimension of language use, and Japanese honorifics offer a well-defined testbed for studying whether pretrained encoders represent socially meaningful distinctions. Prior BERT-based work has applied supervised models to Japanese honorific data, but we are not aware of analyses that localize honorific-level information across layers or test causal influence via activation patching in Japanese BERT-style encoders. We study these questions in LineDistilBERT using the KeiCO corpus, which labels sentences with four honorific levels. To isolate pretrained representations while still defining a task predictor, we freeze all encoder parameters and train only a lightweight [CLS] classification head as a minimal readout. We then run layerwise linear probing, training multinomial L2-regularized logistic-regression probes on [CLS] vectors from each layer to quantify linear decodability across depth and to select a best layer on development data. Finally, we test causal leverage with [CLS] activation patching, transplanting donor activations into receiver sentences at selected layers and measuring prediction transitions, logit shifts, and flip rates under standard controls. Overall, honorific level is broadly decodable across layers, and [CLS] interventions can systematically steer the frozen-encoder classifier with strong depth dependence, providing complementary evidence from probing and causal intervention for Japanese politeness in practice.

Resources

Details

Paper ID

lrec2026-ws-soconnlpsi-07

Pages

pp. 67-75

DOI

10.63317/4vex4yooz83k

BibKey

hashimoto-etal-2026-where

Editors

Marco Antonio Stranisci, Neele Falk, Sofie Labat, Soda Marem Lo, Aswathy Velutharambath, Sabine Weber, Rossana Damiano, Simona Frenda, Veronique Hoste, Bennett Kleinberg, Roman Klinger, Viviana Patti, Flor Miriam Plaza-del-Arco, Maarten Sap, Seid Muhie Yimam

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the 1st Workshop on Social Context (SoCon) and the 2nd Workshop on Integrating NLP and Psychology to Study Social Interactions (NLPSI) @ LREC 2026

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

SH
Shusuke Hashimoto
WS
Wenchen Shi

Links

URL

DOI