Beyond Code: Evaluate Thought Steps for Complex Code Generation

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/3agxr3ouzo5m

Abstract

Code generation aims to generate code in a general-purpose programming language, such as C++, based on natural language intents. Existing efforts primarily focus on relatively simple programming problems and fail to evaluate the thought process involved in complex programming scenarios. In this paper, we introduce “steps-guided code generation,” a task that assesses the quality of both thought steps and code implementation to evaluate the overall management of handling a complex programming problem. To support this task, we construct CodeStepsEval, a real-world scenario dataset of complex programming problems in the C++ programming language with varying levels of difficulty. Comprehensive experiments on this dataset demonstrate the importance of high-quality steps in enhancing code generation performance and the challenges faced by the code LLMs in this task.