Back to Main Conference 2012
LREC 2012main

The IULA Treebank

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/3gjoqtavuzzf

Abstract

This paper describes on-going work for the construction of a new treebank for Spanish, The IULA Treebank. This new resource will contain about 60,000 richly annotated sentences as an extension of the already existing IULA Technical Corpus which is only PoS tagged. In this paper we have focused on describing the work done for defining the annotation process and the treebank design principles. We report on how the used framework, the DELPH-IN processing framework, has been crucial in the design principles and in the bootstrapping strategy followed, especially in what refers to the use of stochastic modules for reducing parsing overgeneration. We also report on the different evaluation experiments carried out to guarantee the quality of the already available results.

Details

Paper ID
lrec2012-main-287
Pages
pp. 1920-1926
BibKey
marimon-etal-2012-iula
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • MM

    Montserrat Marimon

  • BF

    Beatriz Fisas

  • NB

    Núria Bel

  • BA

    Blanca Arias

  • SV

    Silvia Vázquez

  • JV

    Jorge Vivaldi

  • ST

    Sergi Torner

  • MV

    Marta Villegas

  • ML

    Mercè Lorente

Links