Investigating Proactivity in Multimodal Task-Guidance Dialogues

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

While proactivity, i.e., the ability to take the initiative and anticipate requests in order to improve the effectiveness of a conversation, has been traditionally investigated in task-oriented dialogues (e.g., booking a restaurant), less work addresses proactive behaviours in task-guidance dialogues (e.g., guide to execute recipes), where the expert instructor is supposed to interact and supervise a user in a real-world setting. We analyse a corpus of video-recorded task-guided dialogues and explore two key features of proactivity in this context: (i) the impact of multimodal features, with respect to chat-based dialogues; (ii) the impact of instructions and actions grounded in a real situation. Through a comparison between task-oriented and task-guidance annotated dialogues, we find that task-guided dialogues are highly collaborative interactions, where preventing mistakes and maintaining the correct process order is essential for achieving the dialogue goal. In addition, the video information available in the task-guidance setting can be corrective for false positive proactive behaviours, although without introducing substantial differences. To support our analysis and to foster further research we provide a corpus of multimodal task-guidance dialogues annotated according to proactivity.