To Skip, to Swap or to Not Swap? Identifying Step Transition Types in Instructional Manuals

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Large language models (LLMs) are increasingly used as procedural planners that provide guidance across applications. However, in human-assistive scenarios where the environment and users’ knowledge constantly change, their ability to detect various step types for generating alternative plans is underexplored. To address this gap, we introduce a novel evaluation task and dataset to assess if models can identify steps that are sequential, interchangeable, and optional in textual instructions across five domains in a step-by-step manner. We compare seven LLM families from both open-source and proprietary spaces across varying sizes to a visually-informed baseline based on procedural knowledge graphs (PKG). Our results suggest that LLMs encode procedural knowledge, enabling them to identify step types with increasing effectiveness as training parameters and data size grow. However, all LLMs exhibit inconsistencies in reasoning on the mutual exclusivity of interchangeable and sequential step pairs. In contrast, the symbolic PKG baseline demonstrates stronger consistency in this aspect. Comprehensive analyses furthermore uncover limitations in LLMs’ procedural reasoning abilities.