HomeLREC 2026WorkshopsNSLPlrec2026-ws-nslp-02
Back to NSLP 2026
LREC 2026workshop

Benchmarking LLMs for ARR Area Assignment: Evidence and Implications for Assignment Strategies

Proceedings of Natural Scientific Language Processing (NSLP) @ LREC 2026

DOI:10.63317/4jbm8o3us4c8

Abstract

We study how large language models (LLMs) perform at assigning ACL Rolling Review (ARR) areas from paper titles/abstracts. Using 558 papers (ACL/EACL/NAACL, 2020 to 2025), we compare multiple LLMs and prompting schemes (zero/few-shot; with/without ARR keywords; each-category variants) and analyze per-area scores, error overlap, and confusion matrices. One-shot prompting (with OpenAI-gpt-oss-20b) tends to perform best, while injecting ARR keywords often lowers accuracy. Task-bounded areas (e.g., MT, IE, QA, Summarization) are predicted more reliably, whereas broad, cross-cutting labels (e.g., Resources and Evaluation, NLP Applications) are frequently conflated, indicating taxonomy ambiguity rather than solely model limitations. We recommend hierarchical or primary-plus-secondary labels to reduce ambiguity and improve reviewer matching. Our dataset, methods, and findings offer a reproducible baseline for area selection support in ACL workflows.

Details

Paper ID
lrec2026-ws-nslp-02
Pages
pp. 13-24
BibKey
bingert-etal-2026-benchmarking
Editors
Georg Rehm, Stefan Dietze, Danilo Dessi, Diana Maynard, Sonja Schimmler
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Natural Scientific Language Processing (NSLP) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • EB

    Eileen Bingert

  • DA

    Diego Alves

  • SD

    Stefania Degaetano-Ortlieb

Links