Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

Click the edit button next to a field to report a correction.
Fill in the suggested correction value for each field you want to correct.
Provide your name and email so we can contact you if needed.

View all submitted correction requests

Paper Information

lrec2026-main-027

MazeEval: A Benchmark for Testing Sequential Decision-Making in Language Models

View lrec2026-main-027.pdf

Paper Fields

Click the edit button next to a field to report a correction.

Title

MazeEval: A Benchmark for Testing Sequential Decision-Making in Language Models

Abstract

As Large Language Models (LLMs) increasingly power autonomous agents in robotics and embodied AI, understanding their spatial reasoning capabilities becomes crucial for reliable deployment. We introduce MazeEval, a benchmark designed to evaluate pure spatial reasoning in LLMs through coordinate-based maze navigation tasks without visual input. Using a function-calling interface, models navigate mazes of varying complexity (5 x 5 to 15 x 15 grids) using only coordinate feedback and distance-to-wall information. We evaluate eight state-of-the-art LLMs across identical mazes in both English and Icelandic to assess cross-linguistic transfer of spatial abilities. Our findings reveal striking disparities: while OpenAI’s O3 achieves perfect navigation up to 30 x 30 mazes, other models exhibit catastrophic failure beyond 9 x 9 mazes, with 100% of failures attributed to excessive looping behavior. We document significant performance degradation in Icelandic, with models solving mazes 3-4 sizes smaller than in English, suggesting spatial reasoning emerges from linguistic patterns rather than language-agnostic mechanisms. These results highlight that spatial intelligence remains fundamentally constrained by training data availability, with important implications for global deployment of LLM-powered autonomous systems.

Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.

PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Name

Comment

Author Declaration *

I declare that I have notified all co-authors of the proposed corrections and obtained their consent, and that all modifications adhere to research ethics standards and the LREC correction policy.

Select at least one field to correct using the edit buttons above.