HomeLREC 2026WorkshopsRESOURCEFULlrec2026-ws-resourceful-14
Back to RESOURCEFUL 2026
LREC 2026workshop

Evaluating Large Language Model-based Natural Language Generation for Modular Dialog systems

The Fourth Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2026)

DOI:10.63317/54wtdkxi95oe

Abstract

While many dialogue systems currently use end-to-end solutions, modular systems offer greater control, sustainability, and more human-like dialogue. This makes them relevant especially when aiming to study human behavior patterns in interactions or applying them to sensitive domains. In this paper, we develop an automated metric to measure the quality of an LLM-based NLG-component in a modular system based on the hallucination tendency and linguistic quality. We apply the metric to various language models and usage techniques and, based on the results, discuss the conditions a model must meet in order to be a good candidate for an NLG-component in a real-time capable dialogue system. Although such automated metrics cannot replace a real interaction study, they help to compare potential approaches of the individual modules. Therefore, they are indispensable when developing and testing modules in isolation. One advancement of the introduced metrics is that it is developed and tested on a German dataset, showing challenges when working with languages other than English and discrepancies to the abilities of Generative AI assumed in current state-of-the-art literature.

Details

Paper ID
lrec2026-ws-resourceful-14
Pages
pp. 142-160
BibKey
emmerling-etal-2026-evaluating
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
The Fourth Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • VE

    Vincent Emmerling

  • CK

    Christoph Kowalski

  • AR

    Amelie Sophie Robrecht-Hilbig

  • SK

    Stefan Kopp

Links