Investigating Memorization in Language Models Trained via Knowledge Distillation
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We analyze how knowledge distillation influences memorization in language models. Although knowledge distillation is a widely used technique to train smaller, more efficient models, its effect on memorization is not well understood, despite the importance of memorization for model utility and privacy. We demonstrate that when the student and teacher models are trained on different datasets, knowledge distillation substantially reduces memorization and accelerates the forgetting of sequences previously memorized by the student. However, knowledge distillation does not eliminate privacy risks: it accelerates memorization when the student is trained on sequences memorized by the teacher, and teachers can leak memorized content even when the student is trained on data that does not contain these sequences. Finally, we find that the size of the teacher model leads to a trade-off between how quickly memorized information is transferred to the student and how much the student ultimately memorizes. Overall, we provide practical insights for balancing the utility of distilled models against the privacy concerns associated with memorization.