Back to Main Conference 2016
LREC 2016main

ASPEC: Asian Scientific Paper Excerpt Corpus

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/3iku22jbpbzm

Abstract

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).

Details

Paper ID
lrec2016-main-350
Pages
pp. 2204-2208
BibKey
nakazawa-etal-2016-aspec
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • TN

    Toshiaki Nakazawa

  • MY

    Manabu Yaguchi

  • KU

    Kiyotaka Uchimoto

  • MU

    Masao Utiyama

  • ES

    Eiichiro Sumita

  • SK

    Sadao Kurohashi

  • HI

    Hitoshi Isahara

Links