HomeLREC 2026WorkshopsWILDRElrec2026-ws-wildre-14
Back to WILDRE 2026
LREC 2026workshop

NE-LID: A Fast and Accurate Language Identification System for Northeast Indian Languages

Proceedings of the 8th Workshop on Indian Language Data: Resources and Evaluation

DOI:10.63317/2q69fiyg73p6

Abstract

Language identification (LID) is crucial for natural language processing systems, yet Northeast Indian languages remain severely underserved by existing multilingual LID models. We present NE-LID, a fast and accurate language identification system specifically designed for eleven languages of Northeast India. Built using character n-gram features with fastText, NE-LID achieves 99.09% accuracy on a balanced test set, significantly outperforming existing multilingual systems including GlotLID (73.12%), OpenLID (42.03%), IndicLID (39.30%), and LangDetect (24.33%). Our model processes predictions in 0.084 milliseconds on average, enabling real-time applications. We demonstrate that character-level modeling outperforms transformer-based approaches for script-diverse, low-resource languages

Details

Paper ID
lrec2026-ws-wildre-14
Pages
pp. 104-108
BibKey
nyalang-2026-ne
Editors
Girish Nath Jha, Kalika Bali, Sobha L, Devendr Kumar
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 8th Workshop on Indian Language Data: Resources and Evaluation
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • BN

    Badal Nyalang

Links