Back to Main Conference 2022
LREC 2022main

Modeling Noise in Paraphrase Detection

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/4rkmaz7ufmd6

Abstract

Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampling strategies.

Details

Paper ID
lrec2022-main-461
Pages
pp. 4324-4332
BibKey
vahtola-etal-2022-modeling
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • TV

    Teemu Vahtola

  • ES

    Eetu Sjöblom

  • JT

    Jörg Tiedemann

  • MC

    Mathias Creutz

Links