AnswerCarefully: Creating a Dataset for LLM Safety in Japanese

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

In this paper we present JLLMSafety, a dataset for promoting the safety of Japanese LLM outputs. The dataset consists of 1,800 pairs of questions and reference answers, where the questions require special attention in answering. It covers a wide range of risk categories established in prior English-language datasets, but the data samples are original in that they are manually curated to reflect the socio-cultural context of LLM usage in Japan. We show that using this dataset for instruction to fine-tune a Japanese LLM led to improved output safety without compromising the utility of general responses. We also report the results of a safety evaluation of 12 Japanese LLMs using this dataset as a benchmark. Finally, we discuss the significance of creating regionally specific datasets of LLM safety, and describe the meta tags we added to the dataset to facilitate the creation of similar datasets in different languages and regions. The dataset is made available publicly for the sole purpose of improving LLM safety without any other usage restrictions.