MesoTree: Annotated Linguistic Resources for Quantitative Comparative Linguistic Analysis and NLP in Mesoamerica

Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)

Abstract

One aspect of descriptive and documentary linguistic materials that is becoming increasingly important in the information age is that they be searchable, quantifiable, and comparable. In this paper, we describe an effort to create morphosyntactically-annotated corpora for a number of under-served Mesoamerican languages using Universal Dependencies. We describe the Mesoamerican linguistic area and languages involved in the project, the training and annotation process, and give a status report on the current state of the corpora. Finally, we describe a comparitive syntax experiment and train UD parsing models on the data, demonstrating the usefulness of UD for facilitating quantitative, comparative linguistic research.