HomeLREC 2026WorkshopsCHIPSALlrec2026-ws-chipsal-13
Back to CHIPSAL 2026
LREC 2026workshop

Why Does Low-Rank Adaptation Work for Hindi-English Code-Mixing? A Geometric Analysis

Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)

DOI:10.63317/55yuuwkijgj9

Abstract

Low-Rank Adaptation (LoRA) enables efficient fine-tuning of large language models, yet why it works particularly well for code-mixed text remains unexplained. We propose that LoRA’s efficiency stems from geometric structure in multilingual pre-trained models: code-mixed embeddings concentrate in low-dimensional cross-lingual subspaces. Through spectral analysis of mBERT and MuRIL on Hindi-English (Hinglish) data, we establish that pre-trained attention weights have effective ranks of 437–441, while LoRA updates (r = 4,8,16) exhibit ranks of 2.1–5.9—a 136× average compression. Cross-lingual geometry measured via Centered Kernel Alignment shows Hinglish embeddings align strongly with Hindi (CKA=0.279) but weakly with English (0.093), compared to a monolingual baseline of 0.074. Statistical tests (Wilcoxon p < 10−19) and permutation ablations confirm these differences are robust. We interpret the convergence of geometric overlap (3.77× baseline) and empirical compression (136×) as evidence that low-rank adaptation exploits pre-existing multilingual structure. Findings are demonstrated on token-level language identification; extensions to other language pairs and tasks remain open questions.

Details

Paper ID
lrec2026-ws-chipsal-13
Pages
pp. 127-136
BibKey
vishwakarma-etal-2026-why
Editors
Kengatharaiyer Sarveswaran, Ashwini Vaidya
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • SV

    Shashank Vishwakarma

  • RK

    Rakesh Kumar

Links