HomeLREC 2026WorkshopsCHIPSALlrec2026-ws-chipsal-09
Back to CHIPSAL 2026
LREC 2026workshop

BNLI: A Linguistically-Refined Bengali Dataset for Natural Language Inference

Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)

DOI:10.63317/2cwaxs8a7csg

Abstract

Despite the growing progress in Natural Language Inference (NLI) research, resources for the Bengali language remain extremely limited. Existing Bengali NLI datasets exhibit several inconsistencies, including annotation errors, ambiguous sentence pairs, and inadequate linguistic diversity, which hinder effective model training and evaluation. To address these limitations, we introduce BNLI, a refined and linguistically curated Bengali NLI dataset designed to support robust language understanding and inference modeling. The dataset was constructed through a rigorous annotation pipeline emphasizing semantic clarity and balance across entailment, contradiction, and neutrality classes. We benchmarked BNLI using a suite of state-of-the-art transformer-based architectures, including multilingual and Bengali-specific models, to assess their ability to capture complex semantic relations in Bengali text. The experimental findings highlight the improved reliability and interpretability achieved with BNLI, establishing it as a strong foundation for advancing research in Bengali and other low-resource language inference tasks. The link to the BNLI dataset: https://github.com/FarahHaque/BNLI-Dataset.git

Details

Paper ID
lrec2026-ws-chipsal-09
Pages
pp. 85-91
BibKey
haque-etal-2026-bnli
Editors
Kengatharaiyer Sarveswaran, Ashwini Vaidya
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • FH

    Farah Binta Haque

  • MY

    Md Yasin

  • SS

    Shishir Saha

  • MR

    Md Shoaib Akhter Rafi

  • FS

    Farig Sadeque

Links