Back to Main Conference 2026
LREC 2026main

Probing Discrete Speech Tokens of Spoken Language Models

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4d4kttgdydde

Abstract

This paper presents a framework for systematic probing of discrete speech token representations in spoken language models (SLMs). We propose three complementary components: a distributional divergence analysis testing whether an attribute is reflected in token usage, token-based classifiers to quantify recoverability and an attribute-conditioned representation analysis revealing phonetic attribute realizations. As a demonstration we apply these probes to tokenizer outputs and model generations from CosyVoice2 and SparkTTS on LibriTTS-R and VCTK. We find that gender is encoded in their respective tokens but in different forms - the signal is more stable across stages and datasets in CosyVoice2, whereas SparkTTS shows weaker cross-stage consistency and stronger pause/prosody-related effects. Exploratory probes of valence, arousal, and dominance are weaker and less consistent. These results show that discrete speech tokens retain speaker-related information in different ways across architectures and that the proposed framework provides an interpretable basis for comparing token representations across spoken language modeling pipelines.

Details

Paper ID
lrec2026-main-184
Pages
pp. 2344-2354
BibKey
naber-etal-2026-probing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SN

    Sven Naber

  • JK

    Julia Koch

  • PS

    Pranav Singh

  • AS

    Alberto Saponaro

  • IK

    Ioanna Karagianni

  • NV

    Ngoc Thang Vu

Links