ASPEC: Asian Scientific Paper Excerpt Corpus

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).

Resources

Details

Paper ID

lrec2016-main-350

Pages

pp. 2204-2208

DOI

10.63317/3iku22jbpbzm

BibKey

nakazawa-etal-2016-aspec

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

TN
Toshiaki Nakazawa
MY
Manabu Yaguchi
KU
Kiyotaka Uchimoto
MU
Masao Utiyama
ES
Eiichiro Sumita
SK
Sadao Kurohashi
HI
Hitoshi Isahara

Links

URL

DOI