Title

Title	Experiences from the Spoken Dutch Corpus Project
Authors	Nelleke Oostdijk (Dept. of Language and Speech, University of Nijmegen P.O. Box 9103, 6500 HD Nijmegen, The Netherlands) Wim Goedertier (Electronics and Information Systems (ELIS) University of Ghent, Sint-Pietersnieuwstraat 41, 9000 Belgium ) Frank van Eynde (Center for Computational Linguistics, University of Leuven Maria-Theresiastraat 21, 3000 Leuven, Belgium) Lous Boves (Dept. of Language and Speech, University of Nijmegen P.O. Box 9103, 6500 HD Nijmegen, The Netherlands) Jean-Pierre Martens (Electronics and Information Systems (ELIS) University of Ghent, Sint-Pietersnieuwstraat 41, 9000 Belgium ) Michael Moortgat (University of Utrecht, OTS Trans 10, 3512 JK Utrecht, The Netherlands) Harald Baayen (Max Planck Institute for Psycholinguistics P.O. Box 310, 6500 XD Nijmegen, The Netherlands)
Session	SP1: Speech Resources
Abstract	This paper provides an overview of the ongoing development of a large corpus of spoken Dutch in Flanders and the Netherlands. We outline the design of this corpus and the various layers of annotation with which the speech signal is enriched. Special attention is paid to the problems we have encountered, and to the tools and protocols developed for obtaining consistent and reliable annotations. We also discuss the outcome of a recent external evaluation of our project by an international committee of experts.
Keywords	Corpus, Dutch corpus
Full Paper	98.pdf