A Corpus for Personalized Dialogue Breakdown Repair in Japanese Open-Domain Conversations

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Recent advances in dialogue systems have been remarkable; however, conversational breakdowns still occur, making it essential to develop appropriate repair strategies. Nevertheless, when a system breakdown actually occurs, it remains unclear how the system should perform the repair, and no corpus has been available to investigate this issue. To address this gap, we presented typical examples of system-induced dialogue breakdowns to crowd workers and collected their expected repair utterances toward the broken system. Each repair utterance was annotated with dialogue act tags, and we constructed a breakdown-repair corpus consisting of 3,990 utterances covering ten representative types of breakdowns. This corpus includes breakdown cases across diverse situations, allowing for the examination of various repair patterns. Furthermore, we also conducted a questionnaire on participants’ personal traits, creating a dataset that enables the investigation of repair strategies tailored to individual user characteristics. In this paper, we report an overview of the dataset and preliminary analysis results.