Back to ECNLP 2024
LREC-COLING 2024workshop

Assessing Image-Captioning Models: A Novel Framework Integrating Statistical Analysis and Metric Patterns

Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024

DOI:10.63317/4v462ev9jgqf

Abstract

In this study, we present a novel evaluation framework for image-captioning models that integrate statistical analysis with common evaluation metrics, utilizing two popular datasets, FashionGen and Amazon, with contrasting dataset variation to evaluate four models: Video-LLaVa, BLIP, CoCa and ViT-GPT2. Our approach not only reveals the comparative strengths of models, offering insights into their adaptability and applicability in real-world scenarios but also contributes to the field by providing a comprehensive evaluation method that considers both statistical significance and practical relevance to guide the selection of models for specific applications. Specifically, we propose Rank Score as a new evaluation metric that is designed for e-commerce image search applications and employ CLIP Score to quantify dataset variation to offer a holistic view of model performance.

Details

Paper ID
lrec2024-ws-ecnlp-09
Pages
pp. 79-87
BibKey
li-etal-2024-assessing
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • QL

    Qiaomu Li

  • YX

    Ying Xie

  • NG

    Nina Grundlingh

  • VC

    Varsha Rani Chawan

  • CW

    Cody Wang

Links