High Resource Bias in AI-Driven Neology: Structural Inequality in Lexical Innovation
Proceedings of the Workshop Neology and Large Language Models
Abstract
Large language models (LLMs) are increasingly deployed to detect, generate, and normalize neologisms across languages. While prior work has examined their capacity to model semantic change and handle temporal drift, insufficient attention has been paid to how training-data asymmetries interact with probabilistic generation mechanisms to structure lexical innovation itself. This paper argues that AI-driven neology is shaped by systematic high-resource bias that privileges dominant languages in the production, stabilization, and dissemination of new lexical items. Drawing on sociolinguistics, language political economy, lexicography, and computational modeling theory, we formalize how distributional imbalance alters innovation likelihood across languages. We introduce a taxonomy of bias types specific to AI-mediated neology, present a probabilistic account of generative reinforcement loops, and illustrate these mechanisms using documented examples from English-Arabic and English-Icelandic language pairs. We derive empirically testable predictions and propose concrete mitigation strategies for lexicographers, language planners, and NLP researchers.