Back to Main Conference 2016
LREC 2016main

Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/28gsnwf4hy43

Abstract

Text-to-speech has long been centered on the production of an intelligible message of good quality. More recently, interest has shifted to the generation of more natural and expressive speech. A major issue of existing approaches is that they usually rely on a manual annotation in expressive styles, which tends to be rather subjective. A typical related issue is that the annotation is strongly influenced ― and possibly biased ― by the semantic content of the text (e.g. a shot or a fault may incite the annotator to tag that sequence as expressing a high degree of excitation, independently of its acoustic realization). This paper investigates the assumption that human annotation of basketball commentaries in excitation levels can be automatically improved on the basis of acoustic features. It presents two techniques for label correction exploiting a Gaussian mixture and a proportional-odds logistic regression. The automatically re-annotated corpus is then used to train HMM-based expressive speech synthesizers, the performance of which is assessed through subjective evaluations. The results indicate that the automatic correction of the annotation with Gaussian mixture helps to synthesize more contrasted excitation levels, while preserving naturalness.

Details

Paper ID
lrec2016-main-613
Pages
pp. 3872-3879
BibKey
brognaux-etal-2016-combining
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • SB

    Sandrine Brognaux

  • TF

    Thomas François

  • MS

    Marco Saerens

Links