Back to Main Conference 2026
LREC 2026main

Vrittanta-AS: Dataset Development and Benchmarking for Event Trigger Detection and Classification in Assamese

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5ieyiqjkvgxt

Abstract

Event trigger detection and classification aim to identify and categorize events within unstructured text. While prior research has primarily focused on news or biomedical corpora, the literary domain, especially short stories, remains largely underexplored. This gap is particularly pronounced for low-resource languages such as Assamese, where limited annotated data and complex narrative structures hinder progress. To address this challenge, we introduce Vrittanta-AS, a manually curated Assamese event trigger detection and classification dataset comprising 13,171 annotated events extracted from short stories. The dataset is designed to advance research in information extraction and narrative understanding for low-resource Indian languages. We conduct a comprehensive evaluation using classical machine learning methods, neural sequential architectures, pre-trained transformer models, and large language models (LLMs) on the proposed dataset. Experimental results demonstrate that IndicBERT v2 achieves the highest performance for both event trigger detection (85.86% micro-F1) and classification (65.21% macro-F1). Vrittanta-AS serves as an important step toward developing benchmark resources for event trigger detection and classification in Assamese literary text.

Details

Paper ID
lrec2026-main-601
Pages
pp. 7581-7591
BibKey
kirti-etal-2026-vrittanta
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • CK

    Chaitanya Kirti

  • DP

    Dhrubajyoti Pathak

  • AA

    Ashish Anand

  • PG

    Prithwijit Guha

Links