Back to Main Conference 2022
LREC 2022main

Unsupervised Machine Translation in Real-World Scenarios

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/2az4mwi5sdyv

Abstract

In this work, we present the work that has been carried on in the MT4All CEF project and the resources that it has generated by leveraging recent research carried out in the field of unsupervised learning. In the course of the project 18 monolingual corpora for specific domains and languages have been collected, and 12 bilingual dictionaries and translation models have been generated. As part of the research, the unsupervised MT methodology based only on monolingual corpora (Artetxe et al., 2017) has been tested on a variety of languages and domains. Results show that in specialised domains, when there is enough monolingual in-domain data, unsupervised results are comparable to those of general domain supervised translation, and that, at any rate, unsupervised techniques can be used to boost results whenever very little data is available.

Details

Paper ID
lrec2022-main-325
Pages
pp. 3038-3047
BibKey
de-gibert-bonet-etal-2022-unsupervised
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • Od

    Ona de Gibert

  • IG

    Iakes Goenaga

  • JA

    Jordi Armengol-Estapé

  • OP

    Olatz Perez-de-Viñaspre

  • CP

    Carla Parra

  • MS

    Marina Sánchez-Torrón

  • MP

    Marcis Pinnis

  • GL

    Gorka Labaka

  • MM

    Maite Melero

Links