HomeLREC 2026WorkshopsRAILlrec2026-ws-rail-04
Back to RAIL 2026
LREC 2026workshop

Comparing Source Language Selection Strategies for Multi-Source Cross-Lingual Transfer to African Languages

Proceedings of Resources for African Indigenous Languages (RAIL) 2026 @ LREC 2026

DOI:10.63317/4ye4hekq2tmu

Abstract

Cross-lingual transfer learning enables building NLP systems for low-resource languages by leveraging data from higher-resource languages. A critical but understudied question for African languages is: which source languages should be selected for multi-source transfer? We present a systematic comparison of four source language selection strategies: random selection (baseline), genetic distance based on language family trees, geographic distance based on speaker locations, and embedding similarity from multilingual models. We evaluate these strategies on Named Entity Recognition, Part-of-Speech tagging, and sentiment analysis across five typologically diverse African target languages (Hausa, Yoruba, Swahili, Igbo, Kinyarwanda) using three multilingual models. We further investigate how the number of source languages affects transfer performance. Our experiments reveal that no single strategy dominates across tasks: geographic distance leads on sequence labeling tasks while embedding similarity is most effective for sentiment analysis, and all informed strategies consistently outperform random selection.

Details

Paper ID
lrec2026-ws-rail-04
Pages
pp. 31-40
BibKey
idris-etal-2026-comparing
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Resources for African Indigenous Languages (RAIL) 2026 @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • TI

    Tewodros Kederalah Idris

  • RE

    Roald Eiselen

  • PM

    Prasenjit Mitra

Links