Back to Main Conference 2016
LREC 2016main

ASPEC: Asian Scientific Paper Excerpt Corpus

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/3iku22jbpbzm

Abstract

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).

Details

Paper ID
lrec2016-main-350
Pages
pp. 2204-2208
BibKey
nakazawa-etal-2016-aspec
Editors
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 - 28 May 2016

Authors

  • TN

    Toshiaki Nakazawa

  • MY

    Manabu Yaguchi

  • KU

    Kiyotaka Uchimoto

  • MU

    Masao Utiyama

  • ES

    Eiichiro Sumita

  • SK

    Sadao Kurohashi

  • HI

    Hitoshi Isahara

Links