HomeLREC 2020WorkshopsWILDRElrec2020-ws-wildre-08
Back to WILDRE 2020
LREC 2020workshop

A Fully Expanded Dependency Treebank for Telugu

Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation

DOI:10.63317/279mcb6za8v7

Abstract

Treebanks are an essential resource for syntactic parsing. The available Paninian dependency treebank(s) for Telugu is annotated only with inter-chunk dependency relations and not all words of a sentence are part of the parse tree. In this paper, we automatically annotate the intra-chunk dependencies in the treebank using a Shift-Reduce parser based on Context Free Grammar rules for Telugu chunks. We also propose a few additional intra-chunk dependency relations for Telugu apart from the ones used in Hindi treebank. Annotating intra-chunk dependencies finally provides a complete parse tree for every sentence in the treebank. Having a fully expanded treebank is crucial for developing end to end parsers which produce complete trees. We present a fully expanded dependency treebank for Telugu consisting of 3220 sentences. In this paper, we also convert the treebank annotated with Anncorra part-of-speech tagset to the latest BIS tagset. The BIS tagset is a hierarchical tagset adopted as a unified part-of-speech standard across all Indian Languages. The final treebank is made publicly available.

Details

Paper ID
lrec2020-ws-wildre-08
Pages
pp. 39-44
BibKey
nallani-etal-2020-fully
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • SN

    Sneha Nallani

  • MS

    Manish Shrivastava

  • DS

    Dipti Sharma

Links