Does Translation Preserve Sentiment? An Analysis of Arabic-English Cross-Lingual Classification

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

Abstract

Machine translation is widely used in cross-lingual sentiment analysis, yet the assumption that translation preserves sentiment remains largely unexamined. We present a systematic analysis of translation-induced sentiment shifts across 11,558 samples from three Arabic-English datasets (AJGT, OCLAR, FSA) using three translation models (Helsinki-NMT, GPT-4o-mini, LLaMA-3.1-8B) and a fixed multilingual classifier (XLM-RoBERTa). A substantial proportion of samples experience sentiment shifts after translation, with accuracy drops ranging from less than 1% to nearly 20%. GPT-4o-mini achieves the strongest sentiment preservation, while LLaMA-3.1-8B exhibits both significant distortion and refusal behaviour. Critically, Helsinki-NMT’s successful translation of all samples indicates that LLaMA’s refusals stem from safety policies rather than input untranslatability. We also find that sentiment shift measurements are pipeline-dependent and vary with the classifier used for evaluation. These findings challenge the translate-then-classify paradigm and provide guidance for cross-lingual Arabic NLP systems.