Back to Main Conference 2006
LREC 2006main

Building and Incorporating Language Models for Persian Continuous Speech Recognition Systems

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/46tyu37v9m3n

Abstract

In this paper building statistical language models for Persian language using a corpus and incorporating them in Persian continuous speech recognition (CSR) system are described. We used Persian Text Corpus for building the language models. First we preprocessed the texts of corpus by correcting the different orthography of words. Also, the number of POS tags was decreased by clustering POS tags manually. Then we extracted word based monogram and POS-based bigram and trigram language models from the corpus. We also present the procedure of incorporating language models in a Persian CSR system. By using the language models 27.4% reduction in word error rate was achieved in the best case.

Details

Paper ID
lrec2006-main-014
Pages
N/A
BibKey
bahrani-etal-2006-building
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • MB

    M. Bahrani

  • HS

    H. Sameti

  • NH

    N. Hafezi

  • HM

    H. Movasagh

Links