Hidden Sentiments: The Impact of Low-level Adversarial Perturbations on Arabic Sentiment Analysis Services

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

Abstract

Sentiment analysis is one of the most popular applications of supervised machine learning for natural language processing. A common approach for obtaining a dataset to train sentiment analysis models is to extract user posts and comments from social media and other online platforms. However, this content is subject to various types of perturbations that go beyond the target of common preprocessing techniques and may impact the models’ performance. In this paper, a set of six popular corpora used in Arabic sentiment analysis research is analyzed to identify common patterns of character-level perturbations. The samples of three selected corpora were then used to test the performance of the online sentiment analysis services offered by three public cloud providers. This test is done using a clean version of each dataset and four other versions, each perturbed using a different technique. Empirical results indicate that no single sentiment analysis service is superior to others in all cases, and all three services are vulnerable to low-level adversarial attacks which may cause up to a 51% relative drop in macro average F1 score, while maintaining readability.