Evaluating Nepali NER and POS Tagging Models on the Achhami Dialect
Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages
Abstract
Nepali Natural Language Processing (NLP) models are typically trained and evaluated on Standard Nepali, which can introduce bias against regional dialects. This study investigates the performance of Named Entity Recognition (NER) and Part-of-Speech (POS) tagging models on Achhami, a Far-Western dialect of Nepal. A parallel corpus of 300 sentence pairs was created, covering news, cultural topics, and everyday conversations. Achhami translations were produced by native speakers to preserve linguistic authenticity. The evaluation compared fine-tuned Transformer models with large language models using zero-shot prompting. Across both tasks, all models showed consistent performance degradation on the Achhami dialect. For NER, F1 scores decreased by 2.12 to 3.97 percent. Claude 3.5 Haiku achieved the best NER performance, while the monolingual NepBertA model unexpectedly outperformed multilingual alternatives, challenging assumptions about multilingual advantages. POS tagging results showed a similar pattern, with accuracy dropping notably on Achhami data. Large language models also showed comparable weaknesses, with accuracy reductions ranging from 2.9 to 7.0 percent. These findings quantify Kathmandu-centered bias in Nepali NLP and highlight the importance of dialectally diverse training data for building more inclusive and equitable language technologies.