HomeLREC 2026WorkshopsWILDRElrec2026-ws-wildre-13
Back to WILDRE 2026
LREC 2026workshop

Integrating Syntactic and Discourse Signals through Multi-Encoder Fusion in NMT for Low-Resource Indian Language Pairs

Proceedings of the 8th Workshop on Indian Language Data: Resources and Evaluation

DOI:10.63317/24vtvyv2iqhs

Abstract

Neural Machine Translation (NMT) for low-resource Indian language pairs such as Hindi–Tamil and Tamil–Malayalam remains challenging due to morphological richness, syntactic divergence, and limited availability of high-quality parallel corpora. While Transformer-based architectures achieve strong performance in high-resource settings, they often struggle to model syntactic structure and discourse-level dependencies in low-resource scenarios, resulting in errors in agreement, word order, and pronoun translation. In this work, we propose a linguistically informed multi-encoder fusion framework that explicitly incorporates syntactic and discourse signals into NMT. Experiments conducted on Hindi–Tamil and Tamil–Malayalam parallel corpora demonstrate consistent improvements over strong Transformer baselines in BLEU and ChrF scores, along with gains in pronoun translation accuracy and agreement consistency. The results highlight the effectiveness of explicit linguistic integration for improving NMT in low-resource Indian language settings.

Details

Paper ID
lrec2026-ws-wildre-13
Pages
pp. 98-103
BibKey
lalithadevi-etal-2026-integrating
Editors
Girish Nath Jha, Kalika Bali, Sobha L, Devendr Kumar
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 8th Workshop on Indian Language Data: Resources and Evaluation
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • SL

    Sobha Lalitha Devi

  • VS

    Vijay Sundar Ram

  • PR

    Pattabhi RK Rao

Links