Back to Main Conference 2022
LREC 2022main

Offensive language detection in Hebrew: can other languages help?

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/322tvve46au2

Abstract

Unfortunately, offensive language in social media is a common phenomenon nowadays. It harms many people and vulnerable groups. Therefore, automated detection of offensive language is in high demand and it is a serious challenge in multilingual domains. Various machine learning approaches combined with natural language techniques have been applied for this task lately. This paper contributes to this area from several aspects: (1) it introduces a new dataset of annotated Facebook comments in Hebrew; (2) it describes a case study with multiple supervised models and text representations for a task of offensive language detection in three languages, including two Semitic (Hebrew and Arabic) languages; (3) it reports evaluation results of cross-lingual and multilingual learning for detection of offensive content in Semitic languages; and (4) it discusses the limitations of these settings.

Details

Paper ID
lrec2022-main-396
Pages
pp. 3715-3723
BibKey
litvak-etal-2022-offensive
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • ML

    Marina Litvak

  • NV

    Natalia Vanetik

  • CL

    Chaya Liebeskind

  • OH

    Omar Hmdia

  • RM

    Rizek Abu Madeghem

Links