Back to Main Conference 2024
LREC-COLING 2024main

Gendered Grammar or Ingrained Bias? Exploring Gender Bias in Icelandic Language Models

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/2qudr7em2fho

Abstract

Large language models, trained on vast datasets, exhibit increased output quality in proportion to the amount of data that is used to train them. This data-driven learning process has brought forth a pressing issue where these models may not only reflect but also amplify gender bias, racism, religious prejudice, and queerphobia present in their training data that may not always be recent. This study explores gender bias in language models trained on Icelandic, focusing on occupation-related terms. Icelandic is a highly grammatically gendered language that favors the masculine when referring to groups of people with indeterminable genders. Our aim is to explore whether language models merely mirror gender distributions within the corresponding professions or if they exhibit biases tied to their grammatical genders. Results indicate a significant overall predisposition towards the masculine but specific occupation terms consistently lean toward a particular gender, indicating complex interplays of societal and linguistic influences.

Details

Paper ID
lrec2024-main-0671
Pages
pp. 7596-7610
BibKey
fridriksdottir-einarsson-2024-gendered
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • SF

    Steinunn Rut Friðriksdóttir

  • HE

    Hafsteinn Einarsson

Links