I Am Not Them: Persistent Outgroup Bias in Large Language Models Arising from Social Identity Persona Setting
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
This research examines how large language models internalize social identities assigned through targeted prompts. Guided by social identity theory, we investigate whether and how these identity assignments cause AI systems to differentiate between "we" (the ingroup) and "they" (the outgroup). We demonstrate that self-categorization of social identity leads to both ingroup favoritism and outgroup bias, with the latter manifesting as strongly as the former. This finding is significant given the fundamental role of outgroup bias in driving intergroup prejudice and discrimination as documented in social psychology. We further propose a strategic intervention to mitigate such bias by guiding language models to adopt the identity of the initially disfavored group. This method, validated across both political and gender domains, exposes a critical dual function of group alignment: adopting one social identity inherently alters the model’s stance toward outgroups, effectively neutralizing pre-existing biases. Our work shows that understanding human-like AI behaviors is a critical prerequisite to building more balanced and socially responsible technology.