Back to Main Conference 2026
LREC 2026main
The Construction of a Mixe Variant Parallel Corpus
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We present the progress and challenges of constructing a Mixe-Spanish parallel corpus for Machine Translation. Mixe is a Mexican Indigenous Language that is spoken by more than 100, 000 speakers. In particular, we focus on the San Juan Guivicovic Mixe variant (mir). The resulting resource is available under an open research license (CC BY-NC-SA). It was created following a previous state-of-the-art methodology for Mexican indigenous languages. In this case, we used paid translators from the variant region. We present a baseline system.