Applying a Dynamic Bayesian Network Framework to Transliteration Identification

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

Abstract

Identification of transliterations is aimed at enriching multilingual lexicons and improving performance in various Natural Language Processing (NLP) applications including Cross Language Information Retrieval (CLIR) and Machine Translation (MT). This paper describes work aimed at using the widely applied graphical models approach of Dynamic Bayesian Networks (DBNs) to transliteration identification. The task of estimating transliteration similarity is not very different from specific identification tasks where DBNs have been successfully applied; it is also possible to adapt DBN models from the other identification domains to the transliteration identification domain. In particular, we investigate the applicability of a DBN framework initially proposed by Filali and Bilmes (2005) to learn edit distance estimation parameters for use in pronunciation classification. The DBN framework enables the specification of a variety of models representing different factors that can affect string similarity estimation. Three DBN models associated with two of the DBN classes originally specified by Filali and Bilmes (2005) have been tested on an experimental set up of Russian-English transliteration identification. Two of the DBN models result in high transliteration identification accuracy and combining the models leads to even much better transliteration identification accuracy.