Back to Main Conference 2026
LREC 2026main

TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/57peripccxqx

Abstract

In text-to-speech (TTS) model training, the saturation of the loss curve indicates how well a model learns the characteristics of the training dataset. But it does not reveal the linguistic properties learned by the model. Existing TTS approaches miss the potential to incorporate linguistic insights into model training. We introduce TTSVowelViz, a novel tool that visualises static and dynamic vowel spaces during model training, bridging linguistic knowledge and TTS model development. It helps identify which vowel sounds are accurately learned and how the vowel spaces are evolved during training. To assess TTSVowelViz, we fine-tuned a TTS model from General American English to New Zealand English and conducted a perception test. Our results show that the formants of specific vowels in the vowel spaces generated by TTSVowelViz align with human perception, effectively visualising the perceived accent shift. This work highlights vowel space visualisation as a valuable interpretability tool for TTS training.

Details

Paper ID
lrec2026-main-375
Pages
pp. 4778-4786
BibKey
udawatta-etal-2026-ttsvowelviz
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • PU

    Pasindu Udawatta

  • JJ

    Jesin James

  • BT

    Balamurali B T

  • CW

    Catherine Inez Watson

  • AN

    Ake Nicholas

  • BA

    Binu Nisal Abeysinghe

Links