Title

Translation memories enrichment by statistical bilingual segmentation

Author(s)

Francisco Nevado (1), Francisco Casacuberta (1), Josu Landa (2)

(1) Dept. de Sistemas Informaticos y Computacion, Camino de Vera s/n, 46022 Valencia, Spain; (2) Ametzagaiña AIE, Zirkuitu Ibilbidea 2-1, 20160 Lasarte-Oria, Spain

Session

P3-W

Abstract

A majority of Machine Aided Translation systems are based on comparisons between a source sentence and reference sentences stored in Translation Memories (TMs). The translation search is done by looking for sentences in a database which are similar to the source sentence. TMs have two basic limitations: the dependency on the repetition of complete sentences and the high cost of building a TM. As human translators do not only remember sentences from their preceding translations, but they also decompose the sentence to be translated and work with smaller units, it would be desirable to enrich the TM database with smaller translation units. This enrichment should also be automatic in order not to increase the cost of building a TM. We propose the application of two automatic bilingual segmentation techniques based on statistical translation methods in order to create new, shorter bilingual segments to be included in a TM database. An evaluation of the two techniques is carried out for a bilingual Basque-Spanish task.

Keyword(s)

Statistical Bilingual Segmentation, Translation Memories, Statistical, Machine Translation

Language(s)

Basque, Spanish

Full Paper

443.pdf