Back to Main Conference 2026
LREC 2026main

Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4wkpwaxktwfn

Abstract

This paper introduces Bangla Key2Text, a large-scale dataset of 2.6 million Bangla keyword-text pairs designed for keyword-driven text generation in a low-resource language. The dataset is constructed using a BERT-based keyword extraction pipeline applied to millions of Bangla news texts, transforming raw articles into structured keyword-text pairs suitable for supervised learning. To establish baseline performance on this new benchmark, we fine-tune two sequence-to-sequence models, mT5 and BanglaT5, and evaluate them using multiple automatic metrics and human judgments. Experimental results show that task-specific fine-tuning substantially improves keyword-conditioned text generation in Bangla compared to zero-shot large language models. The dataset, trained models, and code are publicly released to support future research in Bangla natural language generation and keyword-to-text generation tasks.

Details

Paper ID
lrec2026-main-303
Pages
pp. 3805-3822
BibKey
talukder-etal-2026-bangla
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • TT

    Tonmoy Talukder

  • GS

    G M Shahariar

Links