Back to Main Conference 2024
LREC-COLING 2024main

Text360Nav: 360-Degree Image Captioning Dataset for Urban Pedestrians Navigation

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/23are7qb6xz3

Abstract

Text feedback from urban scenes is a crucial tool for pedestrians to understand surroundings, obstacles, and safe pathways. However, existing image captioning datasets often concentrate on the overall image description and lack detailed scene descriptions, overlooking features for pedestrians walking on urban streets. We developed a new dataset to assist pedestrians in urban scenes using 360-degree camera images. Through our dataset of Text360Nav, we aim to provide textual feedback from machinery visual perception such as 360-degree cameras to visually impaired individuals and distracted pedestrians navigating urban streets, including those engrossed in their smartphones while walking. In experiments, we combined our dataset with multimodal generative models and observed that models trained with our dataset can generate textual descriptions focusing on street objects and obstacles that are meaningful in urban scenes in both quantitative and qualitative analyses, thus supporting the effectiveness of our dataset for urban pedestrian navigation.

Details

Paper ID
lrec2024-main-1371
Pages
pp. 15783-15788
BibKey
nishimura-etal-2024-text360nav
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • CN

    Chieko Nishimura

  • SK

    Shuhei Kurita

  • YS

    Yohei Seki

Links