Back to Main Conference 2016
LREC 2016main

Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/2mugjt5a8usa

Abstract

Resources for evaluating sentence-level and word-level alignment algorithms are unsatisfactory. Regarding sentence alignments, the existing data is too scarce, especially when it comes to difficult bitexts, containing instances of non-literal translations. Regarding word-level alignments, most available hand-aligned data provide a complete annotation at the level of words that is difficult to exploit, for lack of a clear semantics for alignment links. In this study, we propose new methodologies for collecting human judgements on alignment links, which have been used to annotate 4 new data sets, at the sentence and at the word level. These will be released online, with the hope that they will prove useful to evaluate alignment software and quality estimation tools for automatic alignment. Keywords: Parallel corpora, Sentence Alignments, Word Alignments, Confidence Estimation

Details

Paper ID
lrec2016-main-099
Pages
pp. 628-635
BibKey
xu-yvon-2016-novel
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • YX

    Yong Xu

  • FY

    François Yvon

Links