Back to Main Conference 2012
LREC 2012main

Automatic Translation of Scientific Documents in the HAL Archive

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/225dhf6ukn6o

Abstract

This paper describes the development of a statistical machine translation system between French and English for scientific papers. This system will be closely integrated into the French HAL open archive, a collection of more than 100.000 scientific papers. We describe the creation of in-domain parallel and monolingual corpora, the development of a domain specific translation system with the created resources, and its adaptation using monolingual resources only. These techniques allowed us to improve a generic system by more than 10 BLEU points.

Details

Paper ID
lrec2012-main-408
Pages
pp. 3933-3936
BibKey
lambert-etal-2012-automatic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • PL

    Patrik Lambert

  • HS

    Holger Schwenk

  • FB

    Frédéric Blain

Links