Back to Main Conference 2024
LREC-COLING 2024main
A Multilingual Parallel Corpus for Aromanian
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Abstract
We report the creation of the first high-quality corpus of Aromanian - an endangered Romance language spoken in the Balkans - and the equivalent sentence-aligned translations into Romanian, English, and French. The corpus is released publicly using several orthographic standards and consists in short stories collected in the ‘70s in Romania. Additionally, we provide an corpus-based analysis of Aromanian linguistic particularities and the overall demographic and political context which impacts the contemporary development of the language.