Back to Main Conference 2000
LREC 2000main

Building a Treebank for Italian: a Data-driven Annotation Schema

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/2dd9od69en66

Abstract

Many natural language researchers are currently turning their attention to treebank development and trying to achieve accuracy and corpus data coverage in their representation formats. This paper presents a data-driven annotation schema developed for an Italian treebank ensuring data coverage and consistency between annotation of linguistic phenomena. The schema is a dependency-based format centered upon the notion of predicate-argument structure augmented with traces to represent discontinuous constituents. The treebank development involves an annotation process performed by a human annotator helped by an interactive parsing tool that builds incrementally syntactic representation of the sentence. To increase the syntactic knowledge of this parser, a specific data-driven strategy has been applied. We describe the cyclical development of the annotation schema highlighting the richness and flexibility of the format, and we present some representational issues.

Details

Paper ID
lrec2000-main-168
Pages
N/A
BibKey
bosco-etal-2000-building
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • CB

    Cristina Bosco

  • VL

    Vincenzo Lombardo

  • DV

    Daniela Vassallo

  • LL

    Leonardo Lesmo

Links