Back to Main Conference 2000
LREC 2000main

Building a Treebank for French

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/296dtnsekd73

Abstract

Very few gold standard annotated corpora are currently available for French. We present an ongoing project to build a reference treebank for French starting with a tagged newspaper corpus of 1 Million words (Abeillé et al., 1998), (Abeillé and Clément, 1999). Similarly to the Penn TreeBank (Marcus et al., 1993), we distinguish an automatic parsing phase followed by a second phase of systematic manual validation and correction. Similarly to the Prague treebank (Hajicova et al., 1998), we rely on several types of morphosyntactic and syntactic annotations for which we define extensive guidelines. Our goal is to provide a theory neutral, surface oriented, error free treebank for French. Similarly to the Negra project (Brants et al., 1999), we annotate both constituents and functional relations.

Details

Paper ID
lrec2000-main-175
Pages
N/A
BibKey
abeille-etal-2000-building
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • AA

    Anne Abeillé

  • LC

    Lionel Clément

  • AK

    Alexandra Kinyon

Links