Unpacking Bias: An Empirical Study of Bias Measurement Metrics, Mitigation Algorithms, and Their Interactions

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4vsrpfcgzsqj

Abstract

Word embeddings (WE) have been shown to capture biases from the text they are trained on, which has led to the development of several bias measurement metrics and bias mitigation algorithms (i.e., methods that transform the embedding space to reduce bias). This study identifies three confounding factors that hinder the comparison of bias mitigation algorithms with bias measurement metrics: (1) reliance on different word sets when applying bias mitigation algorithms, (2) leakage between training words employed by mitigation methods and evaluation words used by metrics, and (3) inconsistencies in normalization transformations between mitigation algorithms. We propose a very simple comparison methodology that carefully controls for word sets and vector normalization to address these factors. We conduct a component isolation experiment to assess how each component of our methodology impacts bias measurement. After comparing the bias mitigation algorithms using our comparison methodology, we observe increased consistency between different debiasing algorithms when evaluated using our approach.