VEIL: A Benchmark for Value-Preserving Entity Identification Limitation

Proceedings of the Joint Workshop on Legal and Ethical Issues in Human Language Technologies and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (LEGAL2026 and CALD-pseudo 2026) @ LREC 2026

DOI:10.63317/2spfi3ghhwaj

Abstract

Large Language Models (LLMs) are linked to several issues regarding Personally Identifiable Information (PII). PII can occur in the training data and can thus be accidentally leaked or extracted with malicious intent, or it can be inputted in LLM-based technologies by users through their prompts. A viable strategy to limit the LLMs’ exposure to PII is to filter input and output data by de-identifying PII, including personal names. This however poses a challenge: a name could refer to a private person in a context containing sensitive information (e.g., Michelangelo is an atheist), or it could refer to a famous artist in another context (e.g., Michelangelo’s Sistine Chapel), and masking the latter may hinder the LLMs’ capabilities in general-knowledge tasks. We tackle the problem of personal name de-identification and focus on the decision of which personal names need to be removed (and which should be kept), based on context. We present VEIL, a challenging benchmark for Value-preserving Entity Identification Limitation, for context-aware de-identification decisions on LLM training data, and compare the performance of different state-of-the-art systems on the task.

Resources

Details

Paper ID

lrec2026-ws-legal-12

Pages

pp. 102-115

DOI

10.63317/2spfi3ghhwaj

BibKey

gold-etal-2026-veil

Editors

Ingo Siegert, Maria Irena Szawerna, Khalid Choukri, Simon Dobnik, Paweł Kamocki, Therese Lindström Tiedemann, Pierre Lison, Ricardo Muñoz Sánchez, Ildikó Pilán, Lisa Södergård, Kossay Talmoudi, Elena Volodina, Xuan-Son Vu

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

DG
Darina Gold
SR
Shadi Rastegar
AL
Alina Liebel
AZ
Alessandra Zarcone

Links

URL

DOI