Back to CL4HEALTH 2024
LREC-COLING 2024workshop

Automated Question-Answer Generation for Evaluating RAG-based Chatbots

Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

DOI:10.63317/3e22ykrrugx8

Abstract

In this research, we propose a framework to generate human-like question-answer pairs with long or factoid answers automatically and, based on them, automatically evaluate the quality of Retrieval-Augmented Generation (RAG). Our framework can also create datasets that assess hallucination levels of Large Language Models (LLMs) by simulating unanswerable questions. We then apply the framework to create a dataset of question-answer (QA) pairs based on more than 1,000 leaflets about the medical and administrative procedures of a hospital. The dataset was evaluated by hospital specialists, who confirmed that more than 50% of the QA pairs are applicable. Finally, we show that our framework can be used to evaluate LLM performance by using Llama-2-13B fine-tuned in Dutch (Vanroy, 2023) with the generated dataset, and show the method’s use in testing models with regard to answering unanswerable and factoid questions appears promising.

Details

Paper ID
lrec2024-ws-cl4health-25
Pages
pp. 204-214
BibKey
gonzalez-torres-etal-2024-automated
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • JG

    Juan José González Torres

  • MB

    Mihai Bogdan Bîndilă

  • SH

    Sebastiaan Hofstee

  • DS

    Daniel Szondy

  • QN

    Quang-Hung Nguyen

  • SW

    Shenghui Wang

  • GE

    Gwenn Englebienne

Links