Back to Main Conference 2024
LREC-COLING 2024main

SilverAlign: MT-Based Silver Data Algorithm for Evaluating Word Alignment

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/56g3idsq6ksv

Abstract

Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance on our silver data correlates well with gold benchmarks for 9 language pairs, making our approach a valid resource for evaluation of different languages and domains when gold data is not available. This addresses the important scenario of missing gold data alignments for low-resource languages.

Details

Paper ID
lrec2024-main-1290
Pages
pp. 14812-14825
BibKey
koksal-etal-2024-silveralign
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • AK

    Abdullatif Koksal

  • SS

    Silvia Severini

  • HS

    Hinrich Schütze

Links