Back to Main Conference 2026
LREC 2026main

Sentiment Analysis and Language Models for Kwanyama

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4whctbu5acfp

Abstract

Kwanyama is related to Swahili, Zulu, and, the more than 300 other languages in the Bantu family. Yet, unlike its better-known relatives, it remains almost entirely absent from modern Natural Language Processing (NLP). We bring Kwanyama into the LLM era of NLP through two key contributions. First, we introduce OkaSentiment, the first sentiment-labeled dataset for Kwanyama. Unlike prior African sentiment corpora that rely primarily on social media, OkaSentiment is grounded in an offline, culturally relevant domain: reviews of domestic labor relationships. The dataset is annotated by over 40 native speakers under expert supervision, with careful quality control. Second, we present OkaLM, the first language models for Kwanyama (1B, 3B, and 8B parameters), obtained by continued pretraining of LLaMA-3 checkpoints on a curated Kwanyama corpus. Together, OkaSentiment and OkaLM bring a left-behind language into the landscape of modern NLP, providing its first benchmark and language models.

Details

Paper ID
lrec2026-main-237
Pages
pp. 3031-3043
BibKey
nakashole-2026-sentiment
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • NN

    Ndapa Nakashole

Links