Back to Main Conference 2014
LREC 2014main

A Toolkit for Efficient Learning of Lexical Units for Speech Recognition

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/4sjwotn8by7x

Abstract

String segmentation is an important and recurring problem in natural language processing and other domains. For morphologically rich languages, the amount of different word forms caused by morphological processes like agglutination, compounding and inflection, may be huge and causes problems for traditional word-based language modeling approach. Segmenting text into better modelable units is thus an important part of the modeling task. This work presents methods and a toolkit for learning segmentation models from text. The methods may be applied to lexical unit selection for speech recognition and also other segmentation tasks.

Details

Paper ID
lrec2014-main-561
Pages
pp. 3072-3075
BibKey
varjokallio-kurimo-2014-toolkit
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • MV

    Matti Varjokallio

  • MK

    Mikko Kurimo

Links