Back to Main Conference 2022
LREC 2022main

Question Modifiers in Visual Question Answering

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/2rbyp7qp8xdp

Abstract

Visual Question Answering (VQA) is a challenge problem that can advance AI by integrating several important sub-disciplines including natural language understanding and computer vision. Large VQA datasets that are publicly available for training and evaluation have driven the growth of VQA models that have obtained increasingly larger accuracy scores. However, it is also important to understand how much a model understands the details that are provided in a question. For example, studies in psychology have shown that syntactic complexity places a larger cognitive load on humans. Analogously, we want to understand if models have the perceptual capability to handle modifications to questions. Therefore, we develop a new dataset using Amazon Mechanical Turk where we asked workers to add modifiers to questions based on object properties and spatial relationships. We evaluate this data on LXMERT which is a state-of-the-art model in VQA that focuses more extensively on language processing. Our conclusions indicate that there is a significant negative impact on the performance of the model when the questions are modified to include more detailed information.

Details

Paper ID
lrec2022-main-158
Pages
pp. 1472-1479
BibKey
britton-etal-2022-question
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • WB

    William Britton

  • SS

    Somdeb Sarkhel

  • DV

    Deepak Venugopal

Links