Back to Main Conference 2026
LREC 2026main

The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialog State Tracking Approach

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5dwmfoqycu6w

Abstract

This paper presents a comparative study of context management strategies for end-to-end Spoken Dialog State Tracking using Speech-LLMs. We systematically evaluate traditional multimodal context (combining text history and spoken current turn), full spoken history, and compressed spoken history approaches. Our experiments on the SpokenWOZ corpus demonstrate that providing the full spoken conversation as input yields the highest performance among models of similar size, significantly surpassing prior methods. Furthermore, we show that attention-pooling-based compression of the spoken history offers a strong trade-off, maintaining competitive accuracy with reduced context size. Detailed analysis confirms that improvements stem from more effective context utilization.

Details

Paper ID
lrec2026-main-206
Pages
pp. 2629-2637
BibKey
ghazal-etal-2026-speech
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • NG

    Nizar El Ghazal

  • AC

    Antoine Caubrière

  • VV

    Valentin Vielzeuf

Links