Back to Main Conference 2004
LREC 2004main

Acquisition and Annotation of Slovenian Broadcast News Database

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/252dprucwskf

Abstract

This paper presents the Slovenian Broadcast News Database project that was started in year 2002 as cooperation between University of Maribor and Slovenian national broadcaster RTV Slovenia. The resulting database will be used for large vocabulary continuous speech recognition and multimedia database retrieval or archive indexation. First some organizational aspects that were needed in initial phase of the project are described. The raw audio and video material was acquired from the original Analog Beta SP Master tapes that are preserved in the RTV Slovenia's archive. Raw material was copied to DAT and DVD media. Also additional teletext material was collected. The manual annotation of speech material is performed with the Transcriber tool. The annotation rules were defined on the basis of general rules for Broadcast News databases, with some special language dependent sections. Also some statistics on a part of current material are given.

Details

Paper ID
lrec2004-main-089
Pages
N/A
BibKey
zgank-etal-2004-acquisition
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • Andrej Žgank

  • TR

    Tomaž Rotovnik

  • MM

    Mirjam Sepesy Maučec

  • DV

    Darinka Verdonik

  • JK

    Janez Kitak

  • DV

    Damjan Vlaj

  • VH

    Vladimir Hozjan

  • ZK

    Zdravko Kačič

  • BH

    Bogomir Horvat

Links