LinguArabic at AraSentEval 2026: MARBERT for Multi-Dialect Arabic Sentiment Analysis

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

Abstract

Sentiment analysis for Arabic dialects remains challenging due to substantial linguistic variation across dialects and the expansion of informal language in user-generated content. The AraSentEval 2026 shared task introduces a multi-dialect benchmark designed to evaluate sentiment classification systems on real-world Arabic data. In this paper, we present LinguArabic’s submission to the sentiment classification track of AraSentEval 2026. Our approach is based on fine-tuning MARBERT, a transformer model pre-trained on large-scale Arabic social media data that captures diverse dialectal patterns. To improve model robustness, we incorporate a multi-stage preprocessing pipeline that includes text normalization, dialect-aware lexical mapping, and confidence-based prediction adjustment. We specifically investigate the impact of advanced normalization rules in reducing lexical sparsity across various regional dialects. Experimental results show that the proposed system achieves a Macro F1-score of 0.8333 on the offcial evaluation set. Our findings highlight the importance of dialect-aware pretraining and preprocessing strategies for improving sentiment classification performance across diverse Arabic dialects, providing a scalable framework for real-world Arabic NLP applications.