Back to Main Conference 2016
LREC 2016main

Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/58vnvrmncno5

Abstract

Sentence alignment is a task that consists in aligning the parallel sentences in a translated article pair. This paper describes a method to perform sentence boundary detection and alignment simultaneously, which significantly improves the alignment accuracy on languages like Chinese with uncertain sentence boundaries. It relies on the definition of hard (certain) and soft (uncertain) punctuation delimiters, the latter being possibly ignored to optimize the alignment result. The alignment method is used in combination with lexicons automatically generated from the input article pairs using pivot-based MT, achieving better coverage of the input words with fewer entries than pre-existing dictionaries. Pivot-based MT makes it possible to build dictionaries for language pairs that have scarce parallel data. The alignment method is implemented in a tool that will be freely available in the near future.

Details

Paper ID
lrec2016-main-348
Pages
pp. 2192-2198
BibKey
bourlon-etal-2016-simultaneous
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • AB

    Antoine Bourlon

  • CC

    Chenhui Chu

  • TN

    Toshiaki Nakazawa

  • SK

    Sadao Kurohashi

Links