Title

Floresta Sintá(c)tica: A treebank for Portuguese

Authors

Susana Cavadas Afonso (VISL project, University of Southern Denmark,Institute of Language and Communication, Campusvej, 55, 5230 Odense M, Denmark)

Eckhard Bick (VISL project, University of Southern Denmark,Institute of Language and Communication, Campusvej, 55, 5230 Odense M, Denmark)

Renato Haber (SINTEF Telecom & Informatics, Pb 124, Blindern, NO-0314 Oslo, Norway)

Diana Santos (SINTEF Telecom & Informatics, Pb 124, Blindern, NO-0314 Oslo, Norway)

Session

WP4: Corpus Annotation

Abstract

This paper reviews the first year of the creation of a publicly available treebank for Portuguese, Floresta Sintá(c)tica, a collaboration project between the VISL and the Computational Processing of Portuguese projects. After briefly describing the main goals and the organization of the project, the creation of the annotated objects is presented in detail: preparing the text to be annotated, applying the Constraint Grammar based PALAVRAS parser, revising its output manually in a two-stage process, and carefully documenting the linguistic options. Some examples of the kind of interesting problems dealt with are presented, and the paper ends with a brief description of the tools developed, the project results so fa1.r, and a mention to a preliminary inter-annotator test and what was learned from it.

Keywords

Treebank

Full Paper

1.pdf