Back to Main Conference 2012
LREC 2012main

Prague Dependency Style Treebank for Tamil

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/4zrkzf4asgsp

Abstract

Annotated corpora such as treebanks are important for the development of parsers, language applications as well as understanding of the language itself. Only very few languages possess these scarce resources. In this paper, we describe our efforts in syntactically annotating a small corpora (600 sentences) of Tamil language. Our annotation is similar to Prague Dependency Treebank (PDT) and consists of annotation at 2 levels or layers: (i) morphological layer (m-layer) and (ii) analytical layer (a-layer). For both the layers, we introduce annotation schemes i.e. positional tagging for m-layer and dependency relations for a-layers. Finally, we discuss some of the issues in treebank development for Tamil.

Details

Paper ID
lrec2012-main-242
Pages
pp. 1888-1894
BibKey
ramasamy-zabokrtsky-2012-prague
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • LR

    Loganathan Ramasamy

  • Zdeněk Žabokrtský

Links