HomeLREC 2026WorkshopsRESOURCEFULlrec2026-ws-resourceful-16
Back to RESOURCEFUL 2026
LREC 2026workshop

Beyond English and Evasion: A Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in Chinese

The Fourth Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2026)

DOI:10.63317/37ko7p43tuys

Abstract

When Large Language Models (LLMs) are deployed in Chinese-language settings, a troubling pattern emerges: safety systems that work well in English break down. These systems struggle to cross linguistic and cultural boundaries, leaving models exposed to adversarial prompts that exploit Chinese-specific evasion techniques, including Pinyin romanization, character decomposition, Internet slang, and hedging tone. To address this gap, we introduce ChiSafe-PAS (Chinese Safety Pilot Annotation Set), a human-annotated benchmark of 1,897 adversarial Chinese prompts spanning four high-stakes domains: self-harm and violence, drugs and illicit trade, fraud, and satire. Of these, 1,544 entries carry complete gold-standard annotations: a 3-class response label (refuse, redirect, respond), a nine-category obfuscation taxonomy, a risk-level rating, and annotator rationale. We describe the dataset design, annotation process, and obfuscation taxonomy in detail. Our primary goal is practical: to give the research community a high-quality, culturally grounded resource for benchmarking LLM safety alignment. In doing so, we engage three broader tensions in the field: the blurring boundary between training and evaluation data, the need for domain coverage grounded in real-world risk, and the limits of scale as a substitute for cultural expertise.

Details

Paper ID
lrec2026-ws-resourceful-16
Pages
pp. 177-186
BibKey
zaghouani-etal-2026-beyond
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
The Fourth Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • WZ

    Wajdi Zaghouani

  • KA

    Kholoud Khalil Aldous

  • YG

    Yicheng Gao

Links