Back to Main Conference 2014
LREC 2014main

Constituency Parsing of Bulgarian: Word- vs Class-based Parsing

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/592t9g3ouurg

Abstract

In this paper, we report the obtained results of two constituency parsers trained with BulTreeBank, an HPSG-based treebank for Bulgarian. To reduce the data sparsity problem, we propose using the Brown word clustering to do an off-line clustering and map the words in the treebank to create a class-based treebank. The observations show that when the classes outnumber the POS tags, the results are better. Since this approach adds on another dimension of abstraction (in comparison to the lemma), its coarse-grained representation can be used further for training statistical parsers.

Details

Paper ID
lrec2014-main-547
Pages
pp. 4056-4060
BibKey
ghayoomi-etal-2014-constituency
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • MG

    Masood Ghayoomi

  • KS

    Kiril Simov

  • PO

    Petya Osenova

Links