Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
Towards Multi-Modal Co-Reference Resolution in Conversational Shopping Agents
Paper Fields
Click the edit button next to a field to report a correction.
Towards Multi-Modal Co-Reference Resolution in Conversational Shopping Agents
The context of modern smart voice assistants is often multi-modal, where images, audio and video content are consumed by users simultaneously. In such a setup, co-reference resolution is especially challenging, and runs across modalities and dialogue turns. We explore the problem of multi-modal co-reference resolution in multi-turn dialogues and quantify the performance of multi-modal LLMs on a specially curated dataset of long, image-interleaved conversations between a voice assistant and human in a shopping use case. We propose a custom architecture for multi-modal embedding alignment using a novel parameter augmentation technique. Our proposed Parameter Augmented LLM approach shows a 4.9% absolute F1 improvement above a cross-attention baseline while reducing the number of parameters being trained by 4x.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.