Back to Main Conference 2024
LREC-COLING 2024main

What Factors Influence LLMs’ Judgments? A Case Study on Question Answering

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/2ecq8riwk5ay

Abstract

Large Language Models (LLMs) are now being considered as judges of high efficiency to evaluate the quality of answers generated by candidate models. However, their judgments may be influenced by complex scenarios and inherent biases, raising concerns about their reliability. This study aims to bridge this gap by introducing four unexplored factors and examining the performance of LLMs as judges, namely answer quantity, inducing statements, judging strategy, and judging style. Additionally, we introduce a new dimension of question difficulty to provide a more comprehensive understanding of LLMs’ judgments across varying question intricacies. We employ ChatGPT, GPT-4, Gemini, and Claude-2 as judges and conduct experiments on Vicuna Benchmark and MT-bench. Our study reveals that LLMs’ judging abilities are susceptible to the influence of these four factors, and analyzing from the newly proposed dimension of question difficulty is highly necessary. We also provide valuable insights into optimizing LLMs’ performance as judges, enhancing their reliability and adaptability across diverse evaluation scenarios.

Details

Paper ID
lrec2024-main-1519
Pages
pp. 17473-17485
BibKey
chen-etal-2024-factors
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • LC

    Lei Chen

  • BL

    Bobo Li

  • LZ

    Li Zheng

  • HW

    Haining Wang

  • ZM

    Zixiang Meng

  • RS

    Runfeng Shi

  • HF

    Hao Fei

  • JZ

    Jun Zhou

  • FL

    Fei Li

  • CT

    Chong Teng

  • DJ

    Donghong Ji

Links