Back to Main Conference 2000
LREC 2000main

Issues in Design and Collection of Large Telephone Speech Corpus for Slovenian Language

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/3yjbdpy9mzov

Abstract

In this paper, different issues in design, collection and evaluation of the large vocabulary telephone speech corpus of Slovenian language are discussed. The database is composed of three text corpora containing 1530 different sentences. It contains read speech of 82 speakers where each speaker read in average more than 200 sentences and 21 speakers read also the text passage of 90 sentences. The initial manual segmentation and labeling of speech material was performed. Based on this the automatic segmentation was carried out. The database should facilitate the development of speech recognition systems to be used in dictation tasks over the telephone. Until now the database was used mostly for isolated digit recognition tasks and word spotting.

Details

Paper ID
lrec2000-main-185
Pages
N/A
BibKey
kacic-etal-2000-issues
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • ZK

    Zdravko Kačič

  • BH

    Bogomir Horvat

  • AZ

    Aleksandra Zögling

Links