Back to Main Conference 2006
LREC 2006main

SmartWeb UMTS Speech Data Collection: The SmartWeb Handheld Corpus

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/32p96bngdta9

Abstract

In this paper we outline the German speech data collection for the SmartWeb project, which is fundedby the German Ministry of Science and Education. We focus on the SmartWeb Handheld Corpus (SHC), which has been collected by the Bavarian Archive for Speech Signals (BAS) at the Phonetic Institute (IPSK) of Munich University. Signals of SHC are being recorded in real-life environments(indoor and outdoor) with real background noise as well as real transmission line errors.We developed a new elicitation method and recording technique, calledsituational prompting, which facilitates collecting realistic dialogue speech data in a cost efficient way.We can show that almost realistic speech queries to a dialogue system issued over a mobile PDA or smart phonecan be collected very efficiently using an automatic speech server.We describe the technical and linguistic features of the resulting speech corpus, which will bepublicly available at BAS or ELDA.

Details

Paper ID
lrec2006-main-151
Pages
N/A
BibKey
mogele-etal-2006-smartweb
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • HM

    Hannes Mögele

  • MK

    Moritz Kaiser

  • FS

    Florian Schiel

Links