Back to Main Conference 2024
LREC-COLING 2024main

Analyzing Effects of Learning Downstream Tasks on Moral Bias in Large Language Models

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4vwzxivrtxgv

Abstract

Pre-training and fine-tuning large language models (LMs) is currently the state-of-the-art methodology for enabling data-scarce downstream tasks. However, the derived models still tend to replicate and perpetuate social biases. To understand this process in more detail, this paper investigates the actual effects of learning downstream tasks on moral bias in LMs. We develop methods to assess the agreement of LMs to explicitly codified norms in both pre-training and fine-tuning stages. Even if a pre-trained foundation model exhibits consistent norms, we find that introducing downstream tasks may indeed lead to unexpected inconsistencies in norm representation. Specifically, we observe two phenomena during fine-tuning across both masked and causal LMs: (1) pre-existing moral bias may be mitigated or amplified even when presented with opposing views and (2) prompt sensitivity may be negatively impacted. We provide empirical evidence of models deteriorating into conflicting states, where contradictory answers can easily be triggered by slight modifications in the input sequence. Our findings thus raise concerns about the general ability of LMs to mitigate moral biases effectively.

Details

Paper ID
lrec2024-main-0082
Pages
pp. 904-923
BibKey
kiehne-etal-2024-analyzing
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • NK

    Niklas Kiehne

  • AL

    Alexander Ljapunov

  • MB

    Marc Bätje

  • WB

    Wolf-Tilo Balke

Links