HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

Abstract

Due to the severity of the social media offensive and hateful comments in Brazil, and the lack of research in Portuguese, this paper provides the first large-scale expert annotated corpus of Brazilian Instagram comments for hate speech and offensive language detection. The HateBR corpus was collected from the comment section of Brazilian politicians’ accounts on Instagram and manually annotated by specialists, reaching a high inter-annotator agreement. The corpus consists of 7,000 documents annotated according to three different layers: a binary classification (offensive versus non-offensive comments), offensiveness-level classification (highly, moderately, and slightly offensive), and nine hate speech groups (xenophobia, racism, homophobia, sexism, religious intolerance, partyism, apology for the dictatorship, antisemitism, and fatphobia). We also implemented baseline experiments for offensive language and hate speech detection and compared them with a literature baseline. Results show that the baseline experiments on our corpus outperform the current state-of-the-art for the Portuguese language.

Resources

Details

Paper ID

lrec2022-main-777

Pages

pp. 7174-7183

DOI

10.63317/544cc4es6d8a

BibKey

vargas-etal-2022-hatebr

Editors

Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-38-2

Conference

Thirteenth Language Resources and Evaluation Conference

Location

Marseille, France

Date

20 - 25 June 2022

Authors

FV
Francielle Vargas
IC
Isabelle Carvalho
FR
Fabiana Rodrigues de Góes
TP
Thiago Pardo
FB
Fabrício Benevenuto

Links

URL

DOI