Back to Main Conference 2000
LREC 2000main

Orthographic Transcription of the Spoken Dutch Corpus

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/3csmffzxyidy

Abstract

This paper focuses on the specification of the orthographic transcription task in the Spoken Dutch Corpus, the problems encountered in making that specification and the evaluation experiments that were carried out to assess the transcription efficiency and the inter-transcriber consistency. It is stated that the role of the orthographic transcriptions in the Spoken Dutch Corpus is twofold: on the one hand, the transcriptions are important for future database users, on the other hand they are indispensable to the development of the corpus itself. The main objectives of the transcription task are the following: (1) to obtain a verbatim transcription that can be made with a minimum level of interpretation of the utterances; (2) to obtain an alignment of the transcription to the speech signal on the level of relatively short chunks; (3) to obtain a transcription that is useful to researchers working in several research areas and (4) to adhere to international standards for existing large speech corpora. In designing the transcription protocol and transcription procedure it was attempted to establish the best compromise between consistency, accuracy and usability of the output and efficiency of the transcription task. For example, the transcription procedure always consists of a first transcription cycle and a verification cycle. Some efficiency and consistency statistics derived from pilot experiments with several students transcribing the same material are presented at the end of the paper. In these experiments the transcribers were also asked to record the amount of time they spent on the different audio files, and to report difficulties they encountered in performing their task.

Details

Paper ID
lrec2000-main-066
Pages
N/A
BibKey
goedertier-etal-2000-orthographic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • WG

    Wim Goedertier

  • SG

    Simo Goddijn

  • JM

    Jean-Pierre Martens

Links