Back to Main Conference 2002
LREC 2002main

Reducing Segmental Duration Variation by Local Speech Rate Normalization of Large Spoken Language Resources

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/34gdttfc2zru

Abstract

We developed a time-domain normalization procedure which uses a speech signal and its corresponding speech rate contour as an input, and produces the normalized speech signal. Then we normalized the speech rate of a large spoken language resource of German read speech. We compared the resulting segment durations with the original durations using several three-way ANOVAs with phone type and speaker as independent variables, since we assume that segment duration variation is determined by segment type (intrinsic duration), by the speaker (speech rate, sociolect, ideolect, dialect, speech production variation), and by linguistic effects (context, syllable structure, accent, and stress). One important result of the statistical analysis was, that the influence of the speaker on segment duration variation decreased dramatically (factor 0.54 for vowels, factor 0.29 for consonants) when normalizing speech rate, despite the fact that sociolect, ideolect, and dialect remained almost unchanged. Since the interaction between the independent variables speaker and phone type remained constantly, the hypothesis arises, that this interaction contains most of the speaker-specific information.

Details

Paper ID
lrec2002-main-249
Pages
N/A
BibKey
pfitzinger-2002-reducing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • HP

    Hartmut R. Pfitzinger

Links