HomeLREC 2022WorkshopsMWElrec2022-ws-mwe-12
Back to MWE 2022
LREC 2022workshop

Automatic Bilingual Phrase Dictionary Construction from GIZA++ Output

Proceedings of the 18th Workshop on Multiword Expressions @LREC2022

DOI:10.63317/443p6d8kofjy

Abstract

Modern encoder-decoder based neural machine translation (NMT) models are normally trained on parallel sentences. Hence, they give best results when translating full sentences rather than sentence parts. Thereby, the task of translating commonly used phrases, which often arises for language learners, is not addressed by NMT models. While for high-resourced language pairs human-built phrase dictionaries exist, less-resourced pairs do not have them. We suggest an approach for building such dictionary automatically based on the GIZA++ output and show that it works significantly better than translating phrases with a sentences-trained NMT system.

Details

Paper ID
lrec2022-ws-mwe-12
Pages
pp. 81-88
BibKey
khusainova-etal-2022-automatic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • AK

    Albina Khusainova

  • VR

    Vitaly Romanov

  • AK

    Adil Khan

Links