Back to DELITE 2024
LREC-COLING 2024workshop

Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models

Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024

DOI:10.63317/5n7cgkpo7tzp

Abstract

We use query results from manually designed corpus queries for fine-tuning an LLM to identify argumentative fragments as a text mining task. The resulting model outperforms both an LLM fine-tuned on a relatively large manually annotated gold standard of tweets as well as a rule-based approach. This proof-of-concept study demonstrates the usefulness of corpus queries to generate training data for complex text categorisation tasks, especially if the targeted category has low prevalence (so that a manually annotated gold standard contains only a small number of positive examples).

Details

Paper ID
lrec2024-ws-delite-7
Pages
pp. 52-57
BibKey
dykes-etal-2024-leveraging
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • ND

    Nathan Dykes

  • SE

    Stephanie Evert

  • PH

    Philipp Heinrich

  • MH

    Merlin Humml

  • LS

    Lutz Schröder

Links