Back to Main Conference 2010
LREC 2010main

The German Reference Corpus DeReKo: A Primordial Sample for Linguistic Research

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/4zmnqqc4vrsk

Abstract

This paper describes DeReKo (Deutsches Referenzkorpus), the Archive of General Reference Corpora of Contemporary Written German at the Institut für Deutsche Sprache (IDS) in Mannheim, and the rationale behind its development. We discuss its design, its legal background, how to access it, available metadata, linguistic annotation layers, underlying standards, ongoing developments, and aspects of using the archive for empirical linguistic research. The focus of the paper is on the advantages of DeReKo's design as a primordial sample from which virtual corpora can be drawn for the specific purposes of individual studies. Both concepts, primordial sample and virtual corpus are explained and illustrated in detail. Furthermore, we describe in more detail how DeReKo deals with the fact that all its texts are subject to third parties' intellectual property rights, and how it deals with the issue of replicability, which is particularly challenging given DeReKo's dynamic growth and the possibility to construct from it an open number of virtual corpora.

Details

Paper ID
lrec2010-main-285
Pages
N/A
BibKey
kupietz-etal-2010-german
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • MK

    Marc Kupietz

  • CB

    Cyril Belica

  • HK

    Holger Keibel

  • AW

    Andreas Witt

Links