Back to Main Conference 2026
LREC 2026main

AI Safety Lost in Translation: Evaluating the Effectiveness of English-Italian Cross-Lingual LLM Safety Alignment

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/24nruqbycv2a

Abstract

Large Language Models (LLMs) have been shown to be vulnerable to various issues of bias and safety, for which new safety alignment techniques have been proposed. In this paper, we investigate the degree to which such techniques improve safety in a non-English language, specifically in Italian, both when they have and don’t have access to safety training data in that language. We evaluate standard mitigation techniques and assess cross-lingual safety transfer by comparing English-only versus bilingual Supervised Fine-Tuning (SFT), on several open-source small LLMs: Qwen3, Llama3.2, and Gemma3. Results confirm a significant cross-lingual safety gap, with most models performing worse in Italian. We find that while prompt engineering is generally effective, the impact of SFT is highly inconsistent. English-only SFT occasionally failed to transfer safety improvements into Italian and even deteriorated the performance of some models. Furthermore, bilingual SFT repeatedly underperformed other mitigation methods. These findings demonstrate that safety alignment does not always generalize across languages and models, and standard mitigation strategies can lead to unpredictable effects. We thus highlight the critical necessity for language-specific evaluation and dedicated multilingual safety research to ensure AI is developed equitably and safely for a global audience.

Details

Paper ID
lrec2026-main-296
Pages
pp. 3697-3713
BibKey
wu-etal-2026-ai
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AW

    Alessio Wu

  • MB

    Martim Brandao

Links