Back to Main Conference 2000
LREC 2000main

The Spoken Dutch Corpus. Overview and First Evaluation

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/4qmwewfsnboc

Abstract

In this paper the Spoken Dutch Corpus project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a 10-million-word corpus of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for research in the fields of computational linguistics and language and speech technology. The paper first gives an overall description of the project, its aims, structure and organization. It then goes on to discuss the considerations - both methodological and practical - that have played a role in the design of the corpus as well as in its compilation and annotation. The paper concludes with an account of the data that are available in the first release of the first part of the corpus that came out on March 1st, 2000.

Details

Paper ID
lrec2000-main-083
Pages
N/A
BibKey
oostdijk-2000-spoken
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • NO

    Nelleke Oostdijk

Links