Cross-Dialectal Transfer for Low-Resource Arabic: The Tunisian Arabic Dependency Treebank
Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)
Abstract
This paper presents a small-scale dependency treebank for Tunisian Arabic (TADT) developed within the Universal Dependencies framework, addressing the scarcity of linguistic resources for the Arabic varieties. The approach employs domain adaptation, leveraging a machine learning model (UDPipe 1.0) trained on Algerian Arabic data to annotate 100 Tunisian Arabic social media comments, followed by manual correction. This pilot study evaluates the feasibility of using machine learning-assisted annotation to scale resource development for spoken Arabic and identifies key challenges in cross-dialectal transfer for improving annotation quality and efficiency. This work contributes to more inclusive and fair representation of Arabic linguistic varieties in academic research and NLP applications.