Back to Main Conference 2026
LREC 2026main

Introducing MELI: The Mandarin-English Language Interview Corpus

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3umiyc4sxwhk

Abstract

We introduce the Mandarin–English Language Interview (MELI) Corpus, an open-source resource of 29.8 hours of speech from 51 Mandarin–English bilingual speakers. MELI combines matched sessions in Mandarin and English with two speaking styles: read sentences and spontaneous interviews about language varieties, standardness, and learning experiences. Audio was recorded at 44.1 kHz (16-bit, stereo). Interviews were fully transcribed, force-aligned at word and phone levels, and anonymized. Descriptively, the Mandarin component totals  14.7 hours (mean duration 17.3 minutes) and the English component  15.1 hours (mean duration 17.8 minutes). We report token/type statistics for each language and document code-switching patterns (frequent in Mandarin sessions; more limited in English sessions). The corpus design supports within-/cross-speaker, within/cross-language acoustic comparison and links speech content to speakers’ stated language attitudes, enabling both quantitative and qualitative analyses. The MELI Corpus will be released with transcriptions, alignments, metadata, scans of labelled maps and documentation under a CC BY-NC 4.0 license.

Details

Paper ID
lrec2026-main-468
Pages
pp. 5896-5904
BibKey
liu-etal-2026-introducing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SL

    Suyuan Liu

  • MB

    Molly Babel

Links