HomeLREC 2020WorkshopsTRAClrec2020-ws-trac-09
Back to TRAC 2020
LREC 2020workshop

Bagging BERT Models for Robust Aggression Identification

Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying

DOI:10.63317/5imuub8wnofn

Abstract

Modern transformer-based models with hundreds of millions of parameters, such as BERT, achieve impressive results at text classification tasks. This also holds for aggression identification and offensive language detection, where deep learning approaches consistently outperform less complex models, such as decision trees. While the complex models fit training data well (low bias), they also come with an unwanted high variance. Especially when fine-tuning them on small datasets, the classification performance varies significantly for slightly different training data. To overcome the high variance and provide more robust predictions, we propose an ensemble of multiple fine-tuned BERT models based on bootstrap aggregating (bagging). In this paper, we describe such an ensemble system and present our submission to the shared tasks on aggression identification 2020 (team name: Julian). Our submission is the best-performing system for five out of six subtasks. For example, we achieve a weighted F1-score of 80.3% for task A on the test dataset of English social media posts. In our experiments, we compare different model configurations and vary the number of models used in the ensemble. We find that the F1-score drastically increases when ensembling up to 15 models, but the returns diminish for more models.

Details

Paper ID
lrec2020-ws-trac-09
Pages
pp. 55-61
BibKey
risch-krestel-2020-bagging
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • JR

    Julian Risch

  • RK

    Ralf Krestel

Links