Back to Main Conference 2026
LREC 2026main

GeoBenchmark: Probing Large Language Models for Geo-Spatial Knowledge

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/26pwz3584735

Abstract

Large Language Models (LLMs) demonstrate strong factual recall of general-purpose knowledge but struggle with grounded geospatial knowledge. To measure and help probe LLMs for spatial knowledge, we present GeoBenchmark, a benchmark for evaluating geographic commonsense along three core spatial relations: direction, distance, and topology. Using data extracted from YAGO2geo and Ordnance Survey ward geometries, spatial relations were formalized as structured triplets and systematically transformed into balanced binary (Yes/No) and Multiple-Choice (MCQ) question-answer pairs. Besides, we consider atomic and composite questions based on the number of spatial relations involved. The resulting dataset comprises 26k binary and 13k MCQ samples, uniformly distributed across atomic, binary, and ternary relation levels. We establish baselines with LLaMA-8B and Mistral-7B under zero-shot prompting, achieving 52-63% accuracy on atomic questions but below 35% on ternary relations, which exposes the models’ limited compositional spatial understanding and strong option bias. GeoBenchmark provides a comprehensive, reproducible resource for probing and advancing LLMs’ geographic commonsense, paving the way for future research in spatial and geographic probing of LLMs as well as knowledge editing.

Details

Paper ID
lrec2026-main-417
Pages
pp. 5335-5348
BibKey
abayomi-etal-2026-geobenchmark
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AA

    Ayomide Abayomi

  • JM

    Jose G. Moreno

  • KR

    Karim Radouane

  • LT

    Lynda Tamine

Links