TryggLLM: A Benchmark for Evaluating LLM Safety in Norwegian
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We introduce TryggLLM, the first safety benchmark dataset for Norwegian. The dataset is intended for benchmarking different types of safety issues that can occur when using Norwegian generative language models. We have manually translated two English benchmark datasets, while modifying the content to be aligned with the Norwegian context. The benchmark dataset is composed of two sub-parts: i) prompts annotated by four native speakers, in both the written variants of Norwegian Bokmål (BM) and Nynorsk (NN), such that each native speaker wrote in their preferred variants (two BM and two NN); ii) prompts and target responses, where each of them has a BM and a NN version. We provide detailed descriptions of the data creation process. We also present a thorough manual evaluation of benchmarking existing open Norwegian LLMs using TryggLLM. Our results show that between 18% and 48% of the generated responses are unsafe, across all tested models.