MaiChat: A Text-based Dialogue Corpus Rich in Conversational Features
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We present a new English corpus of typed instant-messaging dialogues that includes detailed timing information. Messages are collected from interactions between pairs who know each other well; the corpus is rich in typed features that augment the purely lexical, including hesitations, self-corrections, expressive respellings, and other markers of spontaneous interaction. Messages are collected using a custom-built chat platform that logs not only message content but also keystroke dynamics, screen activity, and demographic metadata. Designed with a transparent and reproducible protocol, the corpus enables scalable data collection while ensuring privacy and consent. We intend that the rich collection of features collected will facilitate future research in areas such as cognitive modelling, human–computer interaction, and conversational AI.