HomeLREC 2020WorkshopsWILDRElrec2020-ws-wildre-07
Back to WILDRE 2020
LREC 2020workshop

Universal Dependency Treebanks for Low-Resource Indian Languages: The Case of Bhojpuri

Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation

DOI:10.63317/3n34nmeggmbe

Abstract

This paper presents the first dependency treebank for Bhojpuri, a resource-poor language that belongs to the Indo-Aryan language family. The objective behind the Bhojpuri Treebank (BHTB) project is to create a substantial, syntactically annotated treebank which not only acts as a valuable resource in building language technological tools, also helps in cross-lingual learning and typological research. Currently, the treebank consists of 4,881 annotated tokens in accordance with the annotation scheme of Universal Dependencies (UD). A Bhojpuri tagger and parser were created using machine learning approach. The accuracy of the model is 57.49% UAS, 45.50% LAS, 79.69% UPOS accuracy and 77.64% XPOS accuracy. The paper describes the details of the project including a discussion on linguistic analysis and annotation process of the Bhojpuri UD treebank.

Details

Paper ID
lrec2020-ws-wildre-07
Pages
pp. 33-38
BibKey
ojha-zeman-2020-universal
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • AO

    Atul Kr. Ojha

  • DZ

    Daniel Zeman

Links