PRIVaThe: An Annotated Dataset of Multi-Objectives Web Search Sessions
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
This paper presents PRIVaThe, a new French-language dataset, consisting of 200 web search sessions from 100 participants performing two multi-objective, multi-hop tasks, designed to enable cross-user comparison of session-level search strategies. Unlike existing datasets that capture only query sequences or final answers, PRIVaThe provides explicit sub-objective decomposition traces for each session. We automatically annotate 3,162 queries with their addressed sub-objective(s) using validated open-weight LLMs (Mistral, LLama3, and Gemma) against human gold annotations. This annotation enables systematic analyses of how users distribute and sequence sub-objectives throughout their sessions, revealing distinct search strategies such as logical, global, and exploratory approaches.