Back to Main Conference 2004
LREC 2004main
Development of Slovenian Broadcast News Speech Database
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)
Abstract
The paper reviews the development of a new Slovenian broadcast news speech database. The database consists of audio, video and annotation transcripts of about 34 hours of television daily news program captured from the public TV station RTVSLO. The paper addresses issues concerning transcription and annotation of the collected data, provides information on content analysis and basic statistics of the collected material and reports about preliminary evaluation of automatic segmentation.