Back to Main Conference 2022
LREC 2022main

Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/5drvhtqy8gsh

Abstract

This paper describes our efforts to extend the PARSEME framework to Modern Standard Arabic. Theapplicability of the PARSEME guidelines was tested by measuring the inter-annotator agreement in theearly annotation stage. A subset of 1,062 sentences from the Prague Arabic Dependency Treebank PADTwas selected and annotated by two Arabic native speakers independently. Following their annotations, anew Arabic corpus with over 1,250 annotated VMWEs has been built. This corpus already exceeds thesmallest corpora of the PARSEME suite, and enables first observations. We discuss our annotation guide-line schema that shows full MWE annotation is realizable in Arabic where we get good inter-annotator agreement.

Details

Paper ID
lrec2022-main-196
Pages
pp. 1839-1848
BibKey
hadj-mohamed-etal-2022-annotating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • NH

    Najet Hadj Mohamed

  • CB

    Cherifa Ben Khelil

  • AS

    Agata Savary

  • IK

    Iskandar Keskes

  • JA

    Jean-Yves Antoine

  • LH

    Lamia Hadrich-Belguith

Links