HomeLREC 2026WorkshopsSLIDElrec2026-ws-slide-16
Back to SLIDE 2026
LREC 2026workshop

Modeling Word-Internal Structures: Morphological Segmentation Across 58 Languages

Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)

DOI:10.63317/4qgccaokf8mz

Abstract

We present the largest multilingual experiment to date on word-to-morph segmentation, covering 58 typologically diverse languages. We describe a newly compiled collection of linguistically annotated resources for the task, providing broad coverage and enabling systematic cross-lingual evaluation. Second, we train two neural models on surface morphological segmentation, achieving 81% average word accuracy on the original datasets, slightly outperforming previous methods. Experiments on custom test sets reveal substantial variation in performance, highlighting the need for further harmonization and more robust multilingual approaches.

Details

Paper ID
lrec2026-ws-slide-16
Pages
pp. 180-190
BibKey
john-etal-2026-modeling
Editors
Germany) Erhard Hinrichs (Tübingen University, Sweden) Joakim Nivre (Uppsala University, Bulgaria) Petya Osenova (Sofia University, USA) James Pustejovsky (Brandeis University, Germany) Claus Zinn (Tübingen University
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • VJ

    Vojtěch John

  • Zdeněk Žabokrtský

  • BR

    Benjamin Reeves

Links