Back to Main Conference 2026
LREC 2026main

MaiChat: A Text-based Dialogue Corpus Rich in Conversational Features

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3kpp3zj47d6d

Abstract

We present a new English corpus of typed instant-messaging dialogues that includes detailed timing information. Messages are collected from interactions between pairs who know each other well; the corpus is rich in typed features that augment the purely lexical, including hesitations, self-corrections, expressive respellings, and other markers of spontaneous interaction. Messages are collected using a custom-built chat platform that logs not only message content but also keystroke dynamics, screen activity, and demographic metadata. Designed with a transparent and reproducible protocol, the corpus enables scalable data collection while ensuring privacy and consent. We intend that the rich collection of features collected will facilitate future research in areas such as cognitive modelling, human–computer interaction, and conversational AI.

Details

Paper ID
lrec2026-main-123
Pages
pp. 1585-1594
BibKey
dao-etal-2026-maichat
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • MD

    Mai Hoang Dao

  • CL

    Catherine Lai

  • PB

    Peter Bell

Links