Evaluating Nepali NER and POS Tagging Models on the Achhami Dialect

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

DOI:10.63317/5hmcq734e4yd

Abstract

Nepali Natural Language Processing (NLP) models are typically trained and evaluated on Standard Nepali, which can introduce bias against regional dialects. This study investigates the performance of Named Entity Recognition (NER) and Part-of-Speech (POS) tagging models on Achhami, a Far-Western dialect of Nepal. A parallel corpus of 300 sentence pairs was created, covering news, cultural topics, and everyday conversations. Achhami translations were produced by native speakers to preserve linguistic authenticity. The evaluation compared fine-tuned Transformer models with large language models using zero-shot prompting. Across both tasks, all models showed consistent performance degradation on the Achhami dialect. For NER, F1 scores decreased by 2.12 to 3.97 percent. Claude 3.5 Haiku achieved the best NER performance, while the monolingual NepBertA model unexpectedly outperformed multilingual alternatives, challenging assumptions about multilingual advantages. POS tagging results showed a similar pattern, with accuracy dropping notably on Achhami data. Large language models also showed comparable weaknesses, with accuracy reductions ranging from 2.9 to 7.0 percent. These findings quantify Kathmandu-centered bias in Nepali NLP and highlight the importance of dialectally diverse training data for building more inclusive and equitable language technologies.

Resources

Details

Paper ID

lrec2026-ws-sigul-21

Pages

pp. 210-221

DOI

10.63317/5hmcq734e4yd

BibKey

dhamala-etal-2026-evaluating

Editors

Atul Kr. Ojha, Sakriani Sakti, Claudia Soria, Maite Melero, John P. McCrae, Constantine Lignos, Chao-Hong Liu, German Rigau Claramunt, Georg Rehm

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

SD
Samikshya Dhamala
RB
Rishav Beejukchhen
ST
Subresh Thakulla
BK
Bikash Kadayat
SK
Supriya Khadka

Links

URL

DOI