Lexicon Optimization: Maximizing Lexical Coverage in Speech Recognition through Automated Compounding
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)
Abstract
In this report we show that a lexicon can be designed in such a way that lexical coverage can be maximized by real-time lexicon expansion and a limited word part lexicon for Dutch speech recognition. More specifically, we describe how the lexicon is designed and how the real-time expansion module was built and tested. Tests were performed using a 36.000 entries lexicon. The test results show that out-of-vocabulary rates are rather small, due to automated rule-based compounding of the lexical building blocks. Statistical information was included to improve the accuracy of the rule-based compounding system. This approach proved to be successful.