Back to Main Conference 2026
LREC 2026main

Investigating Memorization in Language Models Trained via Knowledge Distillation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/39ec72wwr6ux

Abstract

We analyze how knowledge distillation influences memorization in language models. Although knowledge distillation is a widely used technique to train smaller, more efficient models, its effect on memorization is not well understood, despite the importance of memorization for model utility and privacy. We demonstrate that when the student and teacher models are trained on different datasets, knowledge distillation substantially reduces memorization and accelerates the forgetting of sequences previously memorized by the student. However, knowledge distillation does not eliminate privacy risks: it accelerates memorization when the student is trained on sequences memorized by the teacher, and teachers can leak memorized content even when the student is trained on data that does not contain these sequences. Finally, we find that the size of the teacher model leads to a trade-off between how quickly memorized information is transferred to the student and how much the student ultimately memorizes. Overall, we provide practical insights for balancing the utility of distilled models against the privacy concerns associated with memorization.

Details

Paper ID
lrec2026-main-344
Pages
pp. 4400-4413
BibKey
mcking-etal-2026-investigating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • MM

    Maarten Mäcking

  • MR

    Michaela Regneri

Links