HomeLREC 2026WorkshopsBUCClrec2026-ws-bucc-12
Back to BUCC 2026
LREC 2026workshop

Leveraging Comparable Toxicity Lexicons in Prompt Instructions for Multilingual Text Detoxification

Proceedings of the 19th Workshop on Building and Using Comparable Corpora (BUCC)

DOI:10.63317/2f5i2922qqe2

Abstract

To mitigate the prevalence of toxic language on digital social media, various NLP approaches have been proposed for automatic text detoxification. However, the potential of toxic expression lexicons as a comparable cross-lingual resource to guide this process remains largely unexplored. In this work, we investigate how such resources can be effectively used to inform multilingual language models about what should and should not be considered toxic. We evaluate four models under two settings—zero-shot prompting and fine-tuning—to assess the impact of incorporating toxic expressions in prompt instruction, including in cross-lingual transfer scenarios. Our results show that both zero-shot prompting and fine-tuning approaches benefit considerably from adding toxic expressions in prompt instructions during training and/or inference. Our findings demonstrate that comparable, lightweight, language-specific toxic expression lexicons constitute an effective mechanism for injecting explicit information about lexical toxicity into multilingual language models.

Details

Paper ID
lrec2026-ws-bucc-12
Pages
pp. 108-118
BibKey
elattar-etal-2026-leveraging
Editors
Reinhard Rapp, Ayla Rigouts Terryn, Serge Sharoff, Pierre Zweigenbaum
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 19th Workshop on Building and Using Comparable Corpora (BUCC)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • YE

    Yassir El Attar

  • ED

    Esra Dönmez

  • NO

    Nina K. Ohlendorf

  • AF

    Agnieszka Falenska

Links