HomeLREC 2026WorkshopsUDWlrec2026-ws-udw-23
Back to UDW 2026
LREC 2026workshop

Cross-Dialectal Transfer for Low-Resource Arabic: The Tunisian Arabic Dependency Treebank

Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)

DOI:10.63317/4vghkox8ptiq

Abstract

This paper presents a small-scale dependency treebank for Tunisian Arabic (TADT) developed within the Universal Dependencies framework, addressing the scarcity of linguistic resources for the Arabic varieties. The approach employs domain adaptation, leveraging a machine learning model (UDPipe 1.0) trained on Algerian Arabic data to annotate 100 Tunisian Arabic social media comments, followed by manual correction. This pilot study evaluates the feasibility of using machine learning-assisted annotation to scale resource development for spoken Arabic and identifies key challenges in cross-dialectal transfer for improving annotation quality and efficiency. This work contributes to more inclusive and fair representation of Arabic linguistic varieties in academic research and NLP applications.

Details

Paper ID
lrec2026-ws-udw-23
Pages
pp. 258-267
BibKey
aissaoui-2026-cross
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • AA

    Amal Aissaoui

Links