Automatic Detection of Acoustic Centres of Reliability for Tagging Paralinguistic Information in Expressive Speech

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

Abstract

Preparation of a unit-database to be used in concatenative speech synthesis demands sufficiently robust, unsupervised algorithms for processing the typically huge corpora. The demands are even more stringent when considering a corpus large enough to capture a wide variety of speaking-styles and emotions, even of a single speaker. This paper describes a method of combining robust acoustic-prosodic and cepstral analyses to locate centres of acoustic-phonetic reliability in the speech stream, wherein physiologically meaningful parameters related to voice quality can be estimated more reliably. These parameters which describe the state of glottal phonation and of supralaryngeal articulation, can then provide a paralinguistic annotation of the unit-database, thereby enabling speech synthesis with a greater variety of expressions and speaking-styles.