Back to Main Conference 2012
LREC 2012main

Semantic annotation of French corpora: animacy and verb semantic classes

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/5omouiotadar

Abstract

This paper presents a first corpus of French annotated for animacy and for verb semantic classes. The resource consists of 1,346 sentences extracted from three different corpora: the French Treebank (Abeillé and Barrier, 2004), the Est-Républicain corpus (CNRTL) and the ESTER corpus (ELRA). It is a set of parsed sentences, containing a verbal head subcategorizing two complements, with annotations on the verb and on both complements, in the TIGER XML format (Mengel and Lezius, 2000). The resource was manually annotated and manually corrected by three annotators. Animacy has been annotated following the categories of Zaenen et al. (2004). Measures of inter-annotator agreement are good (Multi-pi = 0.82 and Multi-kappa = 0.86 (k = 3, N = 2360)). As for verb semantic classes, we used three of the five levels of classification of an existing dictionary: 'Les Verbes du Français' (Dubois and Dubois-Charlier, 1997). For the higher level (generic classes), the measures of agreement are Multi-pi = 0.84 and Multi-kappa = 0.87 (k = 3, N = 1346). The inter-annotator agreements show that the annotated data are reliable for both animacy and verbal semantic classes.

Details

Paper ID
lrec2012-main-312
Pages
pp. 1533-1537
BibKey
thuilier-danlos-2012-semantic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • JT

    Juliette Thuilier

  • LD

    Laurence Danlos

Links