Mind the Language Gap: Assessing LLM Safety in Italian
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
The rapid diffusion of Large Language Models (LLMs) across linguistic and cultural contexts underscores the need for systematic safety evaluations beyond English. As LLMs are increasingly applied in multilingual settings, ensuring their safe and appropriate behavior in other languages is essential. This paper presents a methodology for building safety evaluation datasets that comprehensively cover the full spectrum of sensitive topics relevant to LLM safety. The resulting resources include a collection of Italian Wikipedia pages encompassing all major categories of sensitive content, and a companion dataset containing three challenging Italian-language questions per page designed to probe model behavior on high-risk issues. Each prompt was annotated into four safety outcome categories: correct refusal, safe informative, unsafe, and ambiguous. Together, these datasets provide a robust foundation for evaluating and benchmarking LLM safety in Italian. To demonstrate their utility, we used them to assess four LLMs, identifying systematic differences in refusal consistency and compliance across sensitive domains. To support transparency and reproducibility, we release a public repository containing the list of categorized Italian Wikipedia pages, the automatically generated prompts, and the standard prompt template used for safety testing. With this work, we aim to advance language-specific safety assessment and support the responsible, culturally grounded deployment of LLMs beyond English.