L3IA-Subtask 1 at AraSentEval Shared Task: Multi-Dialect Arabic Sentiment Classification via a Transformer-Based Approach

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

Abstract

This paper presents our system and findings for AraSentEval 2026 Subtask 1 on Arabic Dialect Sentiment Analysis. We propose an automated sentiment classification system grounded in advanced Natural Language Processing (NLP) techniques. The proposed approach leverages pre-trained Transformer-based architectures to categorize textual inputs into three sentiment polarities: positive, negative, and neutral. Initially, a text normalization procedure is applied to unify the orthographic and graphical variations characteristic of the Arabic language. This process is further complemented by repetition reduction techniques, which aim to mitigate textual noise and enhance the overall consistency of the data. Subsequently, the data are adapted to the requirements of the pre-trained models to ensure coherent tokenization. The processed texts are then encoded into numerical representations that serve as inputs during training and evaluation. Finally, we conduct a comprehensive benchmarking study of five Transformer-based architectures to assess their effectiveness. The best-performing experimental setup yielded remarkable results on the AraSentEval 2026 benchmark, achieving a micro-F1 score of 75.96% on the official test set.