Back to Main Conference 2022
LREC 2022main

Investigating Active Learning Sampling Strategies for Extreme Multi Label Text Classification

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/48xs9zc3987o

Abstract

Large scale, multi-label text datasets with high numbers of different classes are expensive to annotate, even more so if they deal with domain specific language. In this work, we aim to build classifiers on these datasets using Active Learning in order to reduce the labeling effort. We outline the challenges when dealing with extreme multi-label settings and show the limitations of existing Active Learning strategies by focusing on their effectiveness as well as efficiency in terms of computational cost. In addition, we present five multi-label datasets which were compiled from hierarchical classification tasks to serve as benchmarks in the context of extreme multi-label classification for future experiments. Finally, we provide insight into multi-class, multi-label evaluation and present an improved classifier architecture on top of pre-trained transformer language models.

Details

Paper ID
lrec2022-main-490
Pages
pp. 4597-4605
BibKey
wertz-etal-2022-investigating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • LW

    Lukas Wertz

  • KM

    Katsiaryna Mirylenka

  • JK

    Jonas Kuhn

  • JB

    Jasmina Bogojeska

Links