Back to Main Conference 2004
LREC 2004main

Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/3gczboyr6q4x

Abstract

C-ORAL-ROM, Integrated Reference Corpora For Spoken Romance Languages, is a multilingual corpus of spontaneous speech delivered within the IST Program. Corpora are tagged with respect to terminal and non terminal prosodic breaks. Terminal breaks are considered the most perceptively relevant cues to determine the utterance boundaries in spontaneous speech resources. The paper presents the evaluation of the inter-annotator agreement accomplished by an institution external to the consortium and shows the level of reliability of the tagging delivered and the annotation scheme adopted. The data show, at cross-linguistic level, a very high K coefficient (between 7.7 and 9.2, according to the language resource). A strong level of agreement specifically for terminal breaks has also been recorded. The data thus show that the annotation of the utterances identified in terms of their prosodic breaks is able to capture relevant perceptual facts, and it appears that the proposed coding scheme can be applied in a highly replicable way.

Details

Paper ID
lrec2004-main-210
Pages
N/A
BibKey
danieli-etal-2004-evaluation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • MD

    Morena Danieli

  • JG

    Juan María Garrido

  • MM

    Massimo Moneglia

  • AP

    Andrea Panizza

  • SQ

    Silvia Quazza

  • MS

    Marc Swerts

Links