Back to Main Conference 2016
LREC 2016main

A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/33rkiouk8a4f

Abstract

In this Paper we present a corpus named SXUCorpus which contains read and spontaneous speech of the Upper Saxon German dialect. The data has been collected from eight archives of local television stations located in the Free State of Saxony. The recordings include broadcasted topics of news, economy, weather, sport, and documentation from the years 1992 to 1996 and have been manually transcribed and labeled. In the paper, we report the methodology of collecting and processing analog audiovisual material, constructing the corpus and describe the properties of the data. In its current version, the corpus is available to the scientific community and is designed for automatic speech recognition (ASR) evaluation with a development set and a test set. We performed ASR experiments with the open-source framework sphinx-4 including a configuration for Standard German on the dataset. Additionally, we show the influence of acoustic model and language model adaptation by the utilization of the development set.

Details

Paper ID
lrec2016-main-736
Pages
pp. 4648-4651
BibKey
herms-etal-2016-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • RH

    Robert Herms

  • LS

    Laura Seelig

  • SM

    Stefanie Münch

  • ME

    Maximilian Eibl

Links