PAIR: A Pilot Dataset for Dual Perspective-based Video-Grounded Dialogue and Reconciliation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Collaborative dialogue in multi-agent settings often requires interlocutors to integrate partially overlapping perceptual information in order to construct a shared representation of a dynamic environment. We introduce PAIR, a pilot conversational corpus designed to examine how humans coordinate under systematic perceptual asymmetry. The dataset comprises 15 dialogues in which participants observed the same activity from complementary egocentric and exocentric video perspectives and engaged in open-ended discussion to produce a joint account. All transcripts were manually verified and annotated with 42 dialogue act categories, enabling fine-grained analysis of interactional structure. Beyond descriptive statistics, PAIR supports examination of measurable conversational configurations, including turn distribution, participation symmetry, and dialogue act composition, which together provide structural indicators of how perspective integration unfolds in dialogue. Although intentionally lightweight, PAIR is positioned as a controlled benchmark for analysing collaborative dialogue mechanisms rather than a large-scale training resource. The corpus supports dialogue act classification, video-grounded dialogue modelling, and investigation of multi-agent reasoning under distributed perceptual access. By coupling dual-perspective grounding with explicit interactional annotation, PAIR offers a compact testbed for studying reconciliation dynamics in task-oriented dialogue.