HomeLREC 2026WorkshopsSIGULlrec2026-ws-sigul-28
Back to SIGUL 2026
LREC 2026workshop

AmazoniaNLP: A Survey of Extreme Low-Resource Languages in the Peruvian-Brazilian Amazon

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

DOI:10.63317/3sbz29bn779p

Abstract

The Amazon basin along the Peru–Brazil border hosts extraordinary linguistic diversity, including many Indigenous languages whose speaker communities span national frontiers. Despite sustained documentation work, most remain extremely low-resource languages (ELRLs) for Natural Language Processing (NLP): reusable corpora are scarce, orthographies vary across countries and institutions, and basic tools such as tokenizers, taggers, and morphological analyzers are largely unavailable. We present a resource-oriented survey of five Indigenous languages of the Western Amazon—Matsés, Amahuaca, Kashinawa, Ticuna, and Kukama-Kukamiria—aimed at supporting more realistic NLP and speech work in extreme low-resource settings. Using a systematic search across academic venues, language archives, and public code/model repositories, we identify and cross-check available materials spanning lexical resources, text corpora, linguistic annotation, and speech collections. For each item we record practical reuse information, including the relevant task or modality, source location, and any stated access, licensing, or usage conditions. Our findings show strong cross-language asymmetries and fragmentation: most materials concentrate in documentation artifacts and lexicons, while standardized datasets with clear access and reuse conditions suitable for training and evaluation remain rare. We conclude with concrete recommendations to improve discoverability, normalize orthographic variation, and prioritize resource creation that maximizes interoperability across tools and benchmarks.

Details

Paper ID
lrec2026-ws-sigul-28
Pages
pp. 280-287
BibKey
zevallos-etal-2026-amazonianlp
Editors
Atul Kr. Ojha, Sakriani Sakti, Claudia Soria, Maite Melero, John P. McCrae, Constantine Lignos, Chao-Hong Liu, German Rigau Claramunt, Georg Rehm
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • RZ

    Rodolfo Joel Zevallos

  • FC

    Fabrício Carraro

  • JO

    John E. Ortega

Links