Back to Main Conference 2018
LREC 2018main

A vision-grounded dataset for predicting typical locations for verbs

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/45ptofymouuj

Abstract

Information about the location of an action is often implicit in text, as humans can infer it based on common sense knowledge. Today’s NLP systems however struggle with inferring information that goes beyond what is explicit in text. Selectional preference estimation based on large amounts of data provides a way to infer prototypical role fillers, but text-based systems tend to underestimate the probability of the most typical role fillers. We here present a new dataset containing thematic fit judgments for 2,000 verb/location pairs. This dataset can be used for evaluating text-based, vision-based or multimodal inference systems for the typicality of an event’s location. We additionally provide three thematic fit baselines for this dataset: a state-of-the-art neural networks based thematic fit model learned from linguistic data, a model estimating typical locations based on the MSCOCO dataset and a simple combination of the systems.

Details

Paper ID
lrec2018-main-570
Pages
N/A
BibKey
mukuze-etal-2018-vision
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • NM

    Nelson Mukuze

  • AR

    Anna Rohrbach

  • VD

    Vera Demberg

  • BS

    Bernt Schiele

Links