Back to Main Conference 2026
LREC 2026main

Multi-Scale Model Compression via Nested Matrix Learning

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5o97c4anqod5

Abstract

Large language models (LLMs) have been widely deployed and have achieved remarkable success in downstream tasks. However, their high latency continues to pose challenges for real-time applications that require fast inference, and the need to train and deploy distinct models for different hardware constraints increases both financial and computational costs. To address this, we propose Nested Matrix Learning (NML), a method that trains a single, flexible model capable of generating multiple high-performing student models of varying sizes. This is achieved by simultaneously optimizing a pre-trained teacher model and its nested sub-models in a single training process, without sacrificing the teacher’s performance. NML provides a flexible and scalable solution, allowing models to adapt to different computational budgets. Our extensive experiments show that student models produced by NML, which can be up to 10x smaller than the full-size model, can be directly deployed for efficient inference or serve as superior initialization points for further fine-tuning in downstream tasks. By preserving the performance of the teacher model while delivering compact and efficient student models of various sizes, NML enhances the usability and adaptability of LLMs in real-world scenarios.

Details

Paper ID
lrec2026-main-196
Pages
pp. 2501-2511
BibKey
dong-etal-2026-multi
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • XD

    Xiangjue Dong

  • AA

    Aditya Anantharaman

  • HP

    Hemant Pugaliya

  • KZ

    Kai Zhong

Links