Back to Main Conference 2026
LREC 2026main

CoachLah: A Singlish–English Parallel Corpus of Health Coaching Conversations with Behavior Goal Annotations

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/54des2aenesc

Abstract

Health coaching (HC) aims to promote sustainable behavior change through goal-oriented dialogue, but research in this area is limited by the scarcity of authentic, transcript-based corpora. Existing datasets are small, English-only, and Western-centric, overlooking cultural and linguistic factors that shape real-world HC interactions. We introduce CoachLah, the first Singlish–English parallel corpus of HC conversations collected from a randomized controlled trial in Singapore. The dataset comprises 36,852 utterances transcribed from almost 160 hours of recorded HC sessions with 51 clients and 4 professional health coaches. Each dialogue is speaker-labeled, transcribed in Singlish, and aligned with high-quality English translations to preserve linguistic and cultural nuances. All sessions include HC summaries written by health coaches after each HC session, from which behavioral goals were manually annotated. To demonstrate the dataset’s utility, we benchmark two downstream tasks: (i) Singlish-to-English translation using fine-tuned open-weight models (e.g., Gemma-2-9B-it) with Low-Rank Adaptation, and (ii) behavioral goal extraction from unstructured HC summaries using span-based modeling (e.g., DeBERTa-v3-base). Together, these contributions establish the first culturally grounded benchmark for low-resource, goal-oriented dialogue research in HC. Both the code and the dataset are available at: https://github.com/IvaBojic/CoachLah.

Details

Paper ID
lrec2026-main-003
Pages
pp. 35-49
BibKey
bojic-etal-2026-coachlah
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • IB

    Iva Bojic

  • MR

    Mathieu Ravaut

  • SM

    Stephanie Hilary Xinyi Ma

  • DT

    Doreen Tan

  • AH

    Andy Hau Yan Ho

  • AK

    Andy Khong

Links