Back to MWE 2024
LREC-COLING 2024workshop

Lexicons Gain the Upper Hand in Arabic MWE Identification

Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

DOI:10.63317/2qwpbito2wpy

Abstract

This paper highlights the importance of integrating MWE identification with the development of syntactic MWE lexicons. It suggests that lexicons with minimal morphosyntactic information can amplify current MWE-annotated datasets and refine identification strategies. To our knowledge, this work represents the first attempt to focus on both seen and unseen of VMWEs for Arabic. It also deals with the challenge of differentiating between literal and figurative interpretations of idiomatic expressions. The approach involves a dual-phase procedure: first projecting a VMWE lexicon onto a corpus to identify candidate occurrences, then disambiguating these occurrences to distinguish idiomatic from literal instances. Experiments outlined in the paper aim to assess the efficacy of this technique, utilizing a lexicon known as LEXAR and the “parseme-ar” corpus. The findings suggest that lexicon-driven strategies have the potential to refine MWE identification, particularly for unseen occurrences.

Details

Paper ID
lrec2024-ws-mwe-13
Pages
pp. 88-97
BibKey
hadj-mohamed-etal-2024-lexicons
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • NH

    Najet Hadj Mohamed

  • AS

    Agata Savary

  • CB

    Cherifa Ben Khelil

  • JA

    Jean-Yves Antoine

  • IK

    Iskandar Keskes

  • LH

    Lamia Hadrich-Belguith

Links