AmazoniaNLP: A Survey of Extreme Low-Resource Languages in the Peruvian-Brazilian Amazon

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

DOI:10.63317/3sbz29bn779p

Abstract

The Amazon basin along the Peru–Brazil border hosts extraordinary linguistic diversity, including many Indigenous languages whose speaker communities span national frontiers. Despite sustained documentation work, most remain extremely low-resource languages (ELRLs) for Natural Language Processing (NLP): reusable corpora are scarce, orthographies vary across countries and institutions, and basic tools such as tokenizers, taggers, and morphological analyzers are largely unavailable. We present a resource-oriented survey of five Indigenous languages of the Western Amazon—Matsés, Amahuaca, Kashinawa, Ticuna, and Kukama-Kukamiria—aimed at supporting more realistic NLP and speech work in extreme low-resource settings. Using a systematic search across academic venues, language archives, and public code/model repositories, we identify and cross-check available materials spanning lexical resources, text corpora, linguistic annotation, and speech collections. For each item we record practical reuse information, including the relevant task or modality, source location, and any stated access, licensing, or usage conditions. Our findings show strong cross-language asymmetries and fragmentation: most materials concentrate in documentation artifacts and lexicons, while standardized datasets with clear access and reuse conditions suitable for training and evaluation remain rare. We conclude with concrete recommendations to improve discoverability, normalize orthographic variation, and prioritize resource creation that maximizes interoperability across tools and benchmarks.

Resources

Details

Paper ID

lrec2026-ws-sigul-28

Pages

pp. 280-287

DOI

10.63317/3sbz29bn779p

BibKey

zevallos-etal-2026-amazonianlp

Editors

Atul Kr. Ojha, Sakriani Sakti, Claudia Soria, Maite Melero, John P. McCrae, Constantine Lignos, Chao-Hong Liu, German Rigau Claramunt, Georg Rehm

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

RZ
Rodolfo Joel Zevallos
FC
Fabrício Carraro
JO
John E. Ortega

Links

URL

DOI