HOTATE: A Japanese Dialogue Corpus Annotated with Responses of Private Thoughts and Public Statements
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
This study aims to reveal how accurately Large Language Models (LLMs) can deal with a speaker’s actual utterances and their true feelings behind them in Japanese dialogue. Speakers use not only private thoughts which express one’s true feelings and intentions, but also public statements which convey their intentions while considering the interlocutor’s feelings and social status. While public statements help to maintain interpersonal relationships, they can obscure the speaker’s true intention, potentially leading to misunderstandings. We extended existing Japanese dialogue corpora by annotating public statements and private thoughts responses for each dialogue in the corpora, and then evaluated LLMs’ ability to classify and generate between these two types of expressions. The results of the classification task revealed that the current LLMs do not understand those expressions at all, and that training with our corpus can significantly improve the recognition performance. Furthermore, the results of the generation task demonstrated that generating private thoughts is more difficult than generating public statements, according to both automatic and human evaluations. We release our corpus, which contains 7,964 human-annotated dialogues.