Back to Main Conference 2022
LREC 2022main

SHARE: A Lexicon of Harmful Expressions by Spanish Speakers

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/2e6hvneo87ee

Abstract

In this paper we present SHARE, a new lexical resource with 10,125 offensive terms and expressions collected from Spanish speakers. We retrieve this vocabulary using an existing chatbot developed to engage a conversation with users and collect insults via Telegram, named Fiero. This vocabulary has been manually labeled by five annotators obtaining a kappa coefficient agreement of 78.8%. In addition, we leverage the lexicon to release the first corpus in Spanish for offensive span identification research named OffendES_spans. Finally, we show the utility of our resource as an interpretability tool to explain why a comment may be considered offensive.

Details

Paper ID
lrec2022-main-139
Pages
pp. 1307-1316
BibKey
plaza-del-arco-etal-2022-share
Editors
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 - 25 June 2022

Authors

  • FP

    Flor Miriam Plaza-del-Arco

  • AP

    Ana Belén Parras Portillo

  • PL

    Pilar López Úbeda

  • BG

    Beatriz Gil

  • MM

    María-Teresa Martín-Valdivia

Links