Back to Main Conference 2002
LREC 2002main
Experiences from the Spoken Dutch Corpus Project
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)
Abstract
This paper provides an overview of the ongoing development of a large corpus of spoken Dutch in Flanders and the Netherlands. We outline the design of this corpus and the various layers of annotation with which the speech signal is enriched. Special attention is paid to the problems we have encountered, and to the tools and protocols developed for obtaining consistent and reliable annotations. We also discuss the outcome of a recent external evaluation of our project by an international committee of experts.