Back to Main Conference 2016
LREC 2016main

FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/5dnbpsjz4jz3

Abstract

In this paper, we present a GOLD standard of part-of-speech tagged transcripts of spoken German. The GOLD standard data consists of four annotation layers ― transcription (modified orthography), normalization (standard orthography), lemmatization and POS tags ― all of which have undergone careful manual quality control. It comes with guidelines for the manual POS annotation of transcripts of German spoken data and an extended version of the STTS (Stuttgart Tübingen Tagset) which accounts for phenomena typically found in spontaneous spoken German. The GOLD standard was developed on the basis of the Research and Teaching Corpus of Spoken German, FOLK, and is, to our knowledge, the first such dataset based on a wide variety of spontaneous and authentic interaction types. It can be used as a basis for further development of language technology and corpus linguistic applications for German spoken language.

Details

Paper ID
lrec2016-main-237
Pages
pp. 1493-1499
BibKey
westpfahl-schmidt-2016-folk
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • SW

    Swantje Westpfahl

  • TS

    Thomas Schmidt

Links