Back to Main Conference 2014
LREC 2014main

Bilingual dictionaries for all EU languages

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/3xybpcnr43oe

Abstract

Bilingual dictionaries can be automatically generated using the GIZA++ tool. However, these dictionaries contain a lot of noise, because of which the quality of outputs of tools relying on the dictionaries are negatively affected. In this work we present three different methods for cleaning noise from automatically generated bilingual dictionaries: LLR, pivot and translation based approach. We have applied these approaches on the GIZA++ dictionaries – dictionaries covering official EU languages – in order to remove noise. Our evaluation showed that all methods help to reduce noise. However, the best performance is achieved using the transliteration based approach. We provide all bilingual dictionaries (the original GIZA++ dictionaries and the cleaned ones) free for download. We also provide the cleaning tools and scripts for free download.

Details

Paper ID
lrec2014-main-623
Pages
pp. 2839-2845
BibKey
aker-etal-2014-bilingual
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • AA

    Ahmet Aker

  • MP

    Monica Paramita

  • MP

    Mārcis Pinnis

  • RG

    Robert Gaizauskas

Links