Back to Main Conference 2016
LREC 2016main

Introducing the Asian Language Treebank (ALT)

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/3anv458nxcy8

Abstract

This paper introduces the ALT project initiated by the Advanced Speech Translation Research and Development Promotion Center (ASTREC), NICT, Kyoto, Japan. The aim of this project is to accelerate NLP research for Asian languages such as Indonesian, Japanese, Khmer, Laos, Malay, Myanmar, Philippine, Thai and Vietnamese. The original resource for this project was English articles that were randomly selected from Wikinews. The project has so far created a corpus for Myanmar and will extend in scope to include other languages in the near future. A 20000-sentence corpus of Myanmar that has been manually translated from an English corpus has been word segmented, word aligned, part-of-speech tagged and constituency parsed by human annotators. In this paper, we present the implementation steps for creating the treebank in detail, including a description of the ALT web-based treebanking tool. Moreover, we report statistics on the annotation quality of the Myanmar treebank created so far.

Details

Paper ID
lrec2016-main-249
Pages
pp. 1574-1578
BibKey
thu-etal-2016-introducing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • YT

    Ye Kyaw Thu

  • WP

    Win Pa Pa

  • MU

    Masao Utiyama

  • AF

    Andrew Finch

  • ES

    Eiichiro Sumita

Links