Back to COGALEX 2024
LREC-COLING 2024workshop

Three Studies on Predicting Word Concreteness with Embedding Vectors

Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024

DOI:10.63317/287jbvenf9fs

Abstract

Human-assigned concreteness ratings for words are commonly used in psycholinguistic and computational linguistic studies. Previous research has shown that such ratings can be modeled and extrapolated by using dense word-embedding representations. However, due to rater disagreement, considerable amounts of human ratings in published datasets are not reliable. We investigate how such unreliable data influences modeling of concreteness with word embeddings. Study 1 compares fourteen embedding models over three datasets of concreteness ratings, showing that most models achieve high correlations with human ratings, and exhibit low error rates on predictions. Study 2 investigates how exclusion of the less reliable ratings influences the modeling results. It indicates that improved results can be achieved when data is cleaned. Study 3 adds additional conditions over those of study 2 and indicates that the improved results hold only for the cleaned data, and that in the general case removing the less reliable data points is not useful.

Details

Paper ID
lrec2024-ws-cogalex-17
Pages
pp. 140-150
BibKey
flor-2024-three
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • MF

    Michael Flor

Links