OpenCor: Latin American and Iberian Languages Open Corpora Forum

Proceedings of LANLP: Bridging Ibero and Latin American NLP Communities

Abstract

The availability of open resources and corpora is a fundamental requirement for research in Natural Language Processing (NLP) and Computational Linguistics; however, languages spoken in Latin America and the Iberian Peninsula, particularly Indigenous, minority, and regional varieties, remain structurally under-resourced and under-represented. This paper presents a historical account of OpenCor (Latin American and Iberian Languages Open Corpora Forum), a community-driven initiative created to promote, document, and discuss open linguistic corpora and lexical resources for these languages. Conceived as a collaborative forum rather than a competitive evaluation venue, OpenCor focuses on data creation, licensing practices, sustainability, and community building. Between 2018 and 2024, OpenCor was organized as a recurring workshop co-located with major conferences, fostering dialogue across countries, institutions, and linguistic traditions. By documenting the initiative’s motivations, organizational trajectory, submission trends, and the diversity of resources presented, this paper aims to preserve institutional memory, highlight the often-invisible labor of corpus development, and provide a reference for future initiatives dedicated to openness and linguistic diversity.