Back to Main Conference 2026
LREC 2026main

Are Language Models Borrowing-Blind? A Multilingual Evaluation of Loanword Identification across 10 Languages

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2mtum74avv3g

Abstract

Throughout language history, words are borrowed from one language to another and gradually become integrated into the recipient’s lexicon. Speakers can often differentiate these loanwords from native vocabulary, particularly in bilingual communities where a dominant language continuously imposes lexical items on a minority language. This paper investigates whether pretrained language models, including large language models, possess similar capabilities for loanword identification. We evaluate multiple models across 10 languages. Despite explicit instructions and contextual information, our results show that models perform poorly in distinguishing loanwords from native ones. These findings corroborate previous evidence that modern NLP systems exhibit a bias toward loanwords rather than native equivalents. Our work has implications for developing NLP tools for minority languages and supporting language preservation in communities under lexical pressure from dominant languages.

Details

Paper ID
lrec2026-main-269
Pages
pp. 3394-3401
BibKey
silva-etal-2026-are
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • MS

    Merilin Sousa Silva

  • SA

    Sina Ahmadi

Links