Back to Main Conference 2006
LREC 2006main

A joint prosody evaluation of French text-to-speech synthesis systems

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/54gksgjyssos

Abstract

This paper reports on prosodic evaluation in the framework of the EVALDA/EvaSy project for text-to-speech (TTS) evaluation for the French language. Prosody is evaluated using a prosodic transplantation paradigm. Intonation contours generated by the synthesis systems are transplanted on a common segmental content. Both diphone based synthesis and natural speech are used. Five TTS systems are tested along with natural voice. The test is a paired preference test (with 19 subjects), using 7 sentences. The results indicate that natural speech obtains consistently the first rank (with an average preference rate of 80%), followed by a selection based system (72%) and a diphone based system (58%). However, rather large variations in judgements are observed among subjects and sentences, and in some cases synthetic speech is preferred to natural speech. These results show the remarkable improvement achieved by the best selection based synthesis systems in terms of prosody. In this way; a new paradigm for evaluation of the prosodic component of TTS systems has been successfully demonstrated.

Details

Paper ID
lrec2006-main-513
Pages
N/A
BibKey
garcia-etal-2006-joint
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • MG

    Marie-Neige Garcia

  • Cd

    Christophe d’Alessandro

  • GB

    Gérard Bailly

  • PB

    Philippe Boula de Mareüil

  • MM

    Michel Morel

Links