Back to Main Conference 2026
LREC 2026main

CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/447hbzyohzyf

Abstract

Large Language Models (LLMs) are increasingly employed as AI tutors in education due to their scalability and potential for personalized instruction. However, off-the-shelf LLMs often underperform in educational settings, exhibiting limitations such as providing answers too readily, failing to adapt their responses to students’ uncertainty, and remaining susceptible to emotionally manipulative prompts. To address these challenges, we introduce CoDAE, a framework that adapts LLMs for educational use through Chain-of-Thought (CoT) data augmentation. We collect real-world dialogues between students and a ChatGPT-based tutor and enrich them using CoT prompting to promote step-by-step reasoning and pedagogically aligned guidance. Furthermore, we design targeted dialogue cases to explicitly mitigate three key limitations: over-compliance, low response adaptivity, and threat vulnerability. We fine-tune four open-source LLMs on different variants of the augmented datasets and evaluate them in simulated educational scenarios using both automatic metrics and LLM-as-a-judge assessments. Our results show that models fine-tuned with CoDAE deliver more pedagogically appropriate guidance, promote student reflection and more effectively prevent premature answer disclosure.

Details

Paper ID
lrec2026-main-837
Pages
pp. 10677-10687
BibKey
yuan-etal-2026-codae
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SY

    Shuzhou Yuan

  • WL

    Willliam LaCroix

  • HG

    Hardik Ghoshal

  • EN

    Ercong Nie

  • MF

    Michael Färber

Links