Back to Main Conference 2026
LREC 2026main

PARSEME 2.0 Multilingual Corpus of Multiword Expressions

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2iy5qf38yhay

Abstract

We present edition 2.0 of the PARSEME multilingual corpus annotated for multiword expressions (MWEs), resulting from efforts of the PARSEME community towards universality-driven modeling of idiomaticity. With respect to previous editions, we extend the annotation scope to all syntactic MWE categories: verbal, nominal, adjectival, adverbial and functional. We cover 17 languages, of which 7 are new. The annotation process is based on cross-lingually unified guidelines, phrased as decision diagrams over linguistic tests, and a typology of 18 MWE categories. The corpus contains almost 5 million tokens, over 250,000 sentences and 140,000 MWE annotations. The applicability of the corpus is tested in baseline experiments with a prompt-based MWE identification system. Results show that generic large language models do not encode sufficient knowledge to solve the MWE identification task.

Details

Paper ID
lrec2026-main-378
Pages
pp. 4819-4834
BibKey
savary-etal-2026-parseme
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AS

    Agata Savary

  • MS

    Manon Scholivet

  • CR

    Carlos Ramisch

  • TN

    Takuya Nakamura

  • EB

    Eric Bilinski

  • SS

    Sara Stymne

  • VG

    Voula Giouli

  • SM

    Stella Markantonatou

  • VP

    Vasile Pais

  • MM

    Maria Mitrofan

  • LE

    Louis Estève

  • BG

    Bruno Guillaume

  • VM

    Verginica Barbu Mititelu

  • Jaka Čibej

  • RH

    Roberto Díaz Hernández

  • VF

    Victoria Fendel

  • PG

    Polona Gantar

  • OK

    Olha Kanishcheva

  • CK

    Cvetana Krstev

  • CL

    Chaya Liebeskind

  • IL

    Irina Lobzhanidze

  • AM

    Aleksandra M. Marković

  • GN

    Gunta Nešpore-Bērzkalne

  • AP

    Adriana S. Pagano

  • MS

    Mehrnoush Shamsfard

  • RS

    Ranka Stankovic

  • VT

    Vahide Tajalli

  • CT

    Carole Tiberius

  • AP

    Aakanksha Padhye

Links