Back to Main Conference 2010
LREC 2010main

Multi-Channel Database of Spontaneous Czech with Synchronization of Channels Recorded by Independent Devices

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/3hx4jpm6sdzi

Abstract

This paper describes Czech spontaneous speech database of lectures on digital signal processing topic collected at Czech Technical University in Prague, commonly with the procedure of its recording and annotation. The database contains 21.7 hours of speech material from 22 speakers recorded in 4 channels with 3 principally different microphones. The annotation of the database is composed from basic time segmentation, orthographic transcription including marks for speaker and environmental non-speech events, pronunciation lexicon in SAMPA alphabet, session and speaker information describing recording conditions, and the documentation. The orthographic transcription with time segmentation is saved in XML format supported by frequently used annotation tool Transcriber. In this article, special attention is also paid to the description of time synchronization of signals recorded by two independent devices: computer based recording platform using two external sound cards and commercial audio recorder Edirol R09. This synchronization is based on cross-correlation analysis with simple automated selection of suitable short signal subparts. The collection and annotation of this database is now complete and its availability via ELRA is currently under preparation.

Details

Paper ID
lrec2010-main-356
Pages
N/A
BibKey
pollak-rajnoha-2010-multi
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • PP

    Petr Pollák

  • JR

    Josef Rajnoha

Links