Back to Main Conference 2026
LREC 2026main

FiNERVINER: Fine-grained Named Entity Recognition for Vulnerable Languages of India's North Eastern Region

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3rs5mcedzvss

Abstract

Named entity recognition (NER), particularly fine-grained NER (FgNER), extracts domain-specific entity information for Natural Language Processing (NLP) applications such as knowledge base construction and relation extraction. While manual annotation for creating relevant data is expensive, distant supervision often produces noisy data. Moreover, resources for coarse-grained and fine-grained NER in Indian languages, particularly in the vulnerable languages of India’s North Eastern Region, remain scarce. This work aims at creating such a resource for three vulnerable languages: <i>Bodo/Boro (brx)</i>, <i>Manipuri/Meitei (mni)</i>, and <i>Mizo/Lushai (lus)</i>, which are regarded as official languages in three Indian states and spoken by more than six million people across five countries in South and Southeast Asia. We use annotations projection from high-resource FgNER datasets using source-to-target parallel corpora and a projection tool built on a multilingual encoder. The dataset comprises over 198k sentences, 282k entities, and 2.8M tokens in each low-resource language. Our thorough analyses validate the dataset’s high quality. We further explore zero-shot and cross-lingual settings, examining the impact of script similarity and multilingualism in cross-lingual FgNER performance. The dataset, expert detector models, the agentic tool, and the interactive web application are available as open-source resources at: <url>https://hf.co/collections/prachuryyaIITG/finerviner</url>.

Details

Paper ID
lrec2026-main-607
Pages
pp. 7655-7667
BibKey
kaushik-etal-2026-finerviner
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • PK

    Prachuryya Kaushik

  • AA

    Ashish Anand

Links