HomeLREC 2026WorkshopsOSACTlrec2026-ws-osact-12
Back to OSACT 2026
LREC 2026workshop

Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

DOI:10.63317/3eyeu3k726ab

Abstract

Parsing dialectal Arabic remains underexplored, with limited progress over the past two decades. Existing Modern Standard Arabic (MSA) parsers perform poorly on dialectal data, motivating the need for dialect-specific approaches. We revisit this task using modern neural models and present new results on Egyptian and Gulf Arabic dependency parsing. We demonstrate that even small amounts of dialectal training data yield substantial improvements in parsing accuracy. Our contributions include: (1) introducing a new annotated dataset for Gulf Arabic, (2) releasing a state-of-the-art multi-variety Arabic parser, and (3) employing dialect identification as a diagnostic tool to better understand how training data affects parsing performance across dialects and test sets.

Details

Paper ID
lrec2026-ws-osact-12
Pages
pp. 94-105
BibKey
faroukzakariaelshabrawy-etal-2026-parsing
Editors
Hend Al-Khalifa, Mo El-Haj, Saad Ezzini
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • AF

    Ahmed Farouk Zakaria Elshabrawy

  • GI

    Go Inoue

  • MA

    Muhammed AbuOdeh

  • NH

    Nizar Habash

Links