Back to Main Conference 2014
LREC 2014main

Collection of a Simultaneous Translation Corpus for Comparative Analysis

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/3jkwcqn9nc5x

Abstract

This paper describes the collection of an English-Japanese/Japanese-English simultaneous interpretation corpus. There are two main features of the corpus. The first is that professional simultaneous interpreters with different amounts of experience cooperated with the collection. By comparing data from simultaneous interpretation of each interpreter, it is possible to compare better interpretations to those that are not as good. The second is that for part of our corpus there are already translation data available. This makes it possible to compare translation data with simultaneous interpretation data. We recorded the interpretations of lectures and news, and created time-aligned transcriptions. A total of 387k words of transcribed data were collected. The corpus will be helpful to analyze differences in interpretations styles and to construct simultaneous interpretation systems.

Details

Paper ID
lrec2014-main-178
Pages
pp. 670-673
BibKey
shimizu-etal-2014-collection
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • HS

    Hiroaki Shimizu

  • GN

    Graham Neubig

  • SS

    Sakriani Sakti

  • TT

    Tomoki Toda

  • SN

    Satoshi Nakamura

Links