When Bigger Isn’t Better: Evaluating LLMs for Arabic Sentiment Analysis

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

Abstract

This study evaluates the performance of a fine-tuned Arabic sentiment transformer (CAMeL-MSA) against eight large language models (LLMs). Using zero-shot prompting across six Arabic sentiment datasets, we compare a specialized, task-specific approach against generalized model capabilities. Results show that the fine-tuned baseline substantially outperformed all LLMs on five of the six datasets in both accuracy and Macro F1-score. While LLMs offer versatility, this comparison highlights the continued practical superiority of task-specific fine-tuning over zero-shot prompting.