Back to Main Conference 2024
LREC-COLING 2024main

A Lifelong Multilingual Multi-granularity Semantic Alignment Approach via Maximum Co-occurrence Probability

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/53cf6d3vuz3q

Abstract

Cross-lingual pre-training methods mask and predict tokens in multilingual text to generalize diverse multilingual information. However, due to the lack of sufficient aligned multilingual resources in the pre-training process, these methods may not fully explore the multilingual correlation of masked tokens, resulting in the limitation of multilingual information interaction. In this paper, we propose a lifelong multilingual multi-granularity semantic alignment approach, which continuously extracts massive aligned linguistic units from noisy data via a maximum co-occurrence probability algorithm. Then, the approach releases a version of the multilingual multi-granularity semantic alignment resource, supporting seven languages, namely English, Czech, German, Russian, Romanian, Hindi and Turkish. Finally, we propose how to use this resource to improve the translation performance on WMT14 18 benchmarks in twelve directions. Experimental results show an average of 0.3 1.1 BLEU improvements in all translation benchmarks. The analysis and discussion also demonstrate the superiority and potential of the proposed approach. The resource used in this work will be publicly available.

Details

Paper ID
lrec2024-main-0060
Pages
pp. 684-694
BibKey
liu-etal-2024-lifelong
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • XL

    Xin Liu

  • HS

    Hongwei Sun

  • SD

    Shaojie Dai

  • BL

    Bo Lv

  • YP

    Youcheng Pan

  • HW

    Hui Wang

  • YY

    Yue Yu

Links