Title

Creating Slovenian Language Resources for Development of Speech-to-Speech Translation Components

Author(s)

Darinka Verdonik, Matej Rojc, Zdravko Kačič

University of Maribor, Faculty of Electrical Engineering and Computer Scinence, Smetanova ul. 17, Maribor, Slovenia

Session

P18-S

Abstract

Article brings detailed information about procedures of building Slovenian lexica within the LC-STAR project, and also detailed information about the size of that lexica. University of Maribor joined the LC-STAR project in order to provide appropriate language resources for developing speech-to-speech translation technology for Slovenian language. Lexica exists from three parts: 65.000 common words, 45.000 proper names and 6.000 special application domain words. All lexica will be morpho-syntactically tagged and phonetically transcribed. Quality of produced language resources is ensured by independent validation.

Keyword(s)

speech-to-speech translation, Slovenian, LC-STAR, POS, lexica, word list, proper names, common words

Language(s) Slovenian
Full Paper

57.pdf