HomeLREC 2026WorkshopsCHIPSALlrec2026-ws-chipsal-06
Back to CHIPSAL 2026
LREC 2026workshop

DR-RAG: Addressing Retrieval Misalignment in Low-Resource Urdu Question Answering

Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)

DOI:10.63317/4wwyss5zkwxs

Abstract

Retrieval-Augmented Generation performs well on English QA benchmarks, but degrades considerably in morphologically rich, low-resource languages. Urdu presents a particularly challenging case: heavy inflectional morphology, Nastaliq script inconsistencies, and limited training data produce a systematic mismatch between query representations and indexed document content that standard retrieval architectures cannot bridge. We propose DR-RAG (Dual-Representation Retrieval-Augmented Generation), which addresses this through dual indexing. Each document is represented as overlapping text chunks and as automatically generated question-answer pairs. Queries are first matched against the QA index, which aligns more reliably with natural query phrasing than declarative document chunks. When retrieval confidence falls below τ = 0.80, the system falls back to chunk-based retrieval, maintaining coverage without sacrificing precision. Evaluated on Urdu UQA and English SQuAD 2.0, DR-RAG improves Urdu METEOR by 38×, ROUGE-1 by 140%, and reduces generation latency by 43%. LLM-as judge scores show higher faithfulness (3.03 vs 1.93) and overall quality (2.99 vs 2.21) over MultiVector. English performance remains competitive throughout. These results indicate that representation-level alignment between queries and indexed content, rather than increased model complexity, is the critical factor for reliable retrieval in underserved South Asian languages.

Details

Paper ID
lrec2026-ws-chipsal-06
Pages
pp. 49-58
BibKey
ahmad-etal-2026-dr
Editors
Kengatharaiyer Sarveswaran, Ashwini Vaidya
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • SA

    Saad Ahmad

  • MH

    Muhammad Hammad

  • MZ

    Muhammad Zeeshan

  • FU

    Faizad Ullah

  • AK

    Asim Karim

Links