Towards Robust Evaluation for Privacy QA Systems

Proceedings of the Joint Workshop on Legal and Ethical Issues in Human Language Technologies and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (LEGAL2026 and CALD-pseudo 2026) @ LREC 2026

DOI:10.63317/4dr8vv9mj47r

Abstract

The transparency principle of the General Data Protection Regulation requires data-processing information to be clear, precise, and accessible. While Large Language Models (LLMs) show promise in this context, their probabilistic nature raises challenges for ensuring truthfulness and comprehensibility. This paper presents an exploratory evaluation of eight Privacy Question Answering (QA) systems – including LLMs, retrieval-augmented generation, and alignment-based approaches – on two datasets. We propose an evaluation framework that maps both traditional NLP and LLM-as-a-judge metrics to the legal requirements of comprehensibility and precision. Results show that no single system consistently excels across all metrics, and that system rankings can vary depending on the choice of metric and thresholding. We highlight open questions and emphasize the need to translate legal requirements into technical evaluation criteria. Our work provides a foundation for a more robust evaluation of Privacy QA systems.

Resources

Details

Paper ID

lrec2026-ws-legal-02

Pages

pp. 12-25

DOI

10.63317/4dr8vv9mj47r

BibKey

leschanowsky-etal-2026-robust

Editors

Ingo Siegert, Maria Irena Szawerna, Khalid Choukri, Simon Dobnik, Paweł Kamocki, Therese Lindström Tiedemann, Pierre Lison, Ricardo Muñoz Sánchez, Ildikó Pilán, Lisa Södergård, Kossay Talmoudi, Elena Volodina, Xuan-Son Vu

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

AL
Anna Leschanowsky
ZK
Zahra Kolagar
EÇ
Erion Çano
IH
Ivan Habernal
DH
Dara Hallinan
EH
Emanuël Habets
BP
Birgit Popp

Links

URL

DOI