HomeLREC 2022WorkshopsMWElrec2022-ws-mwe-08
Back to MWE 2022
LREC 2022workshop

Multi-word Lexical Units Recognition in WordNet

Proceedings of the 18th Workshop on Multiword Expressions @LREC2022

DOI:10.63317/2nwufkee8dyj

Abstract

WordNet is a state-of-the-art lexical resource used in many tasks in Natural Language Processing, also in multi-word expression (MWE) recognition. However, not all MWEs recorded in WordNet could be indisputably called lexicalised. Some of them are semantically compositional and show no signs of idiosyncrasy. This state of affairs affects all evaluation measures that use the list of all WordNet MWEs as a gold standard. We propose a method of distinguishing between lexicalised and non-lexicalised word combinations in WordNet, taking into account lexicality features, such as semantic compositionality, MWE length and translational criterion. Both a rule-based approach and a ridge logistic regression are applied, beating a random baseline in precision of singling out lexicalised MWEs, as well as in recall of ruling out cases of non-lexicalised MWEs.

Details

Paper ID
lrec2022-ws-mwe-08
Pages
pp. 49-54
BibKey
maziarz-etal-2022-multi
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • MM

    Marek Maziarz

  • ER

    Ewa Rudnicka

  • ŁG

    Łukasz Grabowski

Links