JamC-QA: A Multiple-Choice Question Answering Benchmark for Japan-Specific Knowledge
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We introduce JamC-QA, a multiple-choice question answering benchmark specifically designed to evaluate Japan-specific knowledge. Existing Japanese QA benchmarks largely consist of questions translated from English or derived from professional exams, primarily targeting academic or generally shared knowledge. Consequently, this limits the usefulness of distinguishing the performance of high-performing Large Language Models on local knowledge acquisition. To address this, JamC-QA serves as a robust resource for assessing the acquisition of Japan-specific knowledge. It comprises 2,309 challenging instances that were created entirely from scratch by human annotators across eight categories: culture, custom, regional identity, geography, history, government, law, and healthcare. Instances that were easily answerable by weak models were filtered out. Evaluation results highlight the critical distinction between model types: while multilingual models scored highly on general benchmarks like MMLU and JMMLU, the results on JamC-QA indicate that they do not fully capture Japan-specific knowledge. Japanese-language models outperform multilingual models, especially on culture- and region-related knowledge such as proverbs, traditional events, and local customs. Furthermore, we find a notable division within Japanese models: models further pretrained on Japanese text excel at administrative and legal questions, while models trained from scratch perform strongly on local and cultural aspects.