Back to Main Conference 2024
LREC-COLING 2024main

Corpus Creation and Automatic Alignment of Historical Dutch Dialect Speech

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/2cxbaetszkze

Abstract

The Dutch Dialect Database (also known as the ‘Nederlandse Dialectenbank’) contains dialectal variations of Dutch that were recorded all over the Netherlands in the second half of the twentieth century. A subset of these recordings of about 300 hours were enriched with manual orthographic transcriptions, using non-standard approximations of dialectal speech. In this paper we describe the creation of a corpus containing both the audio recordings and their corresponding transcriptions and focus on our method for aligning the recordings with the transcriptions and the metadata.

Details

Paper ID
lrec2024-main-0357
Pages
pp. 4021-4029
BibKey
bentum-etal-2024-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • MB

    Martijn Bentum

  • ES

    Eric Sanders

  • Av

    Antal P.J. van den Bosch

  • DZ

    Douwe Zeldenrust

  • Hv

    Henk van den Heuvel

Links