Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

Abstract

This paper describes our efforts to extend the PARSEME framework to Modern Standard Arabic. Theapplicability of the PARSEME guidelines was tested by measuring the inter-annotator agreement in theearly annotation stage. A subset of 1,062 sentences from the Prague Arabic Dependency Treebank PADTwas selected and annotated by two Arabic native speakers independently. Following their annotations, anew Arabic corpus with over 1,250 annotated VMWEs has been built. This corpus already exceeds thesmallest corpora of the PARSEME suite, and enables first observations. We discuss our annotation guide-line schema that shows full MWE annotation is realizable in Arabic where we get good inter-annotator agreement.

Resources

Details

Paper ID

lrec2022-main-196

Pages

pp. 1839-1848

DOI

10.63317/5drvhtqy8gsh

BibKey

hadj-mohamed-etal-2022-annotating

Editors

Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-38-2

Conference

Thirteenth Language Resources and Evaluation Conference

Location

Marseille, France

Date

20 - 25 June 2022

Authors

NH
Najet Hadj Mohamed
CB
Cherifa Ben Khelil
AS
Agata Savary
IK
Iskandar Keskes
JA
Jean-Yves Antoine
LH
Lamia Hadrich-Belguith

Links

URL

DOI