Title

Experiences from the Spoken Dutch Corpus Project

Authors

Nelleke Oostdijk (Dept. of Language and Speech, University of Nijmegen P.O. Box 9103, 6500 HD Nijmegen, The Netherlands)

Wim Goedertier  (Electronics and Information Systems (ELIS)  University of Ghent, Sint-Pietersnieuwstraat 41, 9000 Belgium )

Frank van Eynde (Center for Computational Linguistics, University of Leuven Maria-Theresiastraat 21, 3000 Leuven, Belgium)

Lous Boves (Dept. of Language and Speech, University of Nijmegen P.O. Box 9103, 6500 HD Nijmegen, The Netherlands)

Jean-Pierre Martens (Electronics and Information Systems (ELIS)  University of Ghent, Sint-Pietersnieuwstraat 41, 9000 Belgium )

Michael Moortgat (University of Utrecht, OTS Trans 10, 3512 JK Utrecht, The Netherlands)

Harald Baayen   (Max Planck Institute for Psycholinguistics P.O. Box 310, 6500 XD Nijmegen, The Netherlands)

Session

SP1: Speech Resources

Abstract

This paper provides an overview of the ongoing development of a large corpus of spoken Dutch in Flanders and the Netherlands. We outline the design of this corpus and the various layers of annotation with which the speech signal is enriched. Special attention is paid to the problems we have encountered, and to the tools and protocols developed for obtaining consistent and reliable annotations. We also discuss the outcome of a recent external evaluation of our project by an international committee of experts.

Keywords

Corpus, Dutch corpus

Full Paper

98.pdf