Back to Main Conference 2012
LREC 2012main

A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/396obpjvu77x

Abstract

Standardized corpora are the foundation for spoken language research. In this work, we introduce an annotated and standardized corpus in the Spoken Dialog Systems (SDS) domain. Data from the Let's Go Bus Information System from the Carnegie Mellon University in Pittsburgh has been formatted, parameterized and annotated with quality, emotion, and task success labels containing 347 dialogs with 9,083 system-user exchanges. A total of 46 parameters have been derived automatically and semi-automatically from Automatic Speech Recognition (ASR), Spoken Language Understanding (SLU) and Dialog Manager (DM) properties. To each spoken user utterance an emotion label from the set garbage, non-angry, slightly angry, very angry has been assigned. In addition, a manual annotation of Interaction Quality (IQ) on the exchange level has been performed with three raters achieving a Kappa value of 0.54. The IQ score expresses the quality of the interaction up to each system-user exchange on a score from 1-5. The presented corpus is intended as a standardized basis for classification and evaluation tasks regarding task success prediction, dialog quality estimation or emotion recognition to foster comparability between different approaches on these fields.

Details

Paper ID
lrec2012-main-157
Pages
pp. 3369-3373
BibKey
schmitt-etal-2012-parameterized
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • AS

    Alexander Schmitt

  • SU

    Stefan Ultes

  • WM

    Wolfgang Minker

Links