HomeLREC 2026WorkshopsWILDRElrec2026-ws-wildre-09
Back to WILDRE 2026
LREC 2026workshop

POS Tagging in Low-Resource Maithili Language: Specific Challenges and Nuances

Proceedings of the 8th Workshop on Indian Language Data: Resources and Evaluation

DOI:10.63317/27ugx7nj4vvs

Abstract

Abstract Part-of-Speech (POS) tagging is a key step in Natural Language Processing (NLP), laying the groundwork for more advanced syntactic and semantic tasks. Despite Maithili’s status as an Indo-Aryan language with a rich literary tradition and official recognition in India, computational resources for it are still very limited. In this paper, the creation of an annotated corpus of 25,000 sentences drawn from the fields of health, tourism, and administration is described with the hierarchical tagset currently used for Maithili. This paper also indicates that standard tagsets, typically adapted from English or Hindi, fail to capture the linguistic nuances of Maithili. This underestimates the need for a dedicated tagging framework that considers characteristics like vocative particles, verbal nuances, honorific complexities. Keywords: Parts of Speech, Natural Language Processing, Maithili, annotation

Details

Paper ID
lrec2026-ws-wildre-09
Pages
pp. 67-74
BibKey
priya-etal-2026-pos
Editors
Girish Nath Jha, Kalika Bali, Sobha L, Devendr Kumar
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 8th Workshop on Indian Language Data: Resources and Evaluation
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • SP

    Shivani Priya

  • SJ

    Shruti Jha

  • UJ

    Urmila Jha

  • GJ

    Girish Nath Jha

  • DT

    Deepali Tiwari

  • JR

    Jyoti Raj

Links