Back to Main Conference 2018
LREC 2018main

Augmenting Image Question Answering Dataset by Exploiting Image Captions

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/39gpdtauvb6o

Abstract

Image question answering (IQA) is one of the tasks that need rich resources, i.e. supervised data, to achieve optimal performance. However, because IQA is a challenging task that handles complex input and output information, the cost of naive manual annotation can be prohibitively expensive. On the other hand, it is thought to be relatively easy to obtain relevant pairs of an image and text in an unsupervised manner (e.g., crawling Web data). Based on this expectation, we propose a framework to augment training data for IQA by generating additional examples from unannotated pairs of an image and captions. The important constraint that a generated IQA example must satisfy is that its answer must be inferable from the corresponding image and question. To satisfy this, we first select a possible answer for a given image by randomly extracting an answer from corresponding captions. Then we generate the question from the triplets of the image, captions and fixed answer. In experiments, we test our method on the Visual Genome dataset varying the ratio of seed supervised data and demonstrate its effectiveness.

Details

Paper ID
lrec2018-main-436
Pages
N/A
BibKey
yokota-nakayama-2018-augmenting
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • MY

    Masashi Yokota

  • HN

    Hideki Nakayama

Links