Back to Main Conference 2010
LREC 2010main

English-Spanish Large Statistical Dictionary of Inflectional Forms

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/2uzfysp86u55

Abstract

The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L'. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

Details

Paper ID
lrec2010-main-155
Pages
N/A
BibKey
sidorov-etal-2010-english
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • GS

    Grigori Sidorov

  • AB

    Alberto Barrón-Cedeño

  • PR

    Paolo Rosso

Links