HomeLREC 2020WorkshopsTRAClrec2020-ws-trac-12
Back to TRAC 2020
LREC 2020workshop

Aggression Identification in English, Hindi and Bangla Text using BERT, RoBERTa and SVM

Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying

DOI:10.63317/5euzg6xrwq4p

Abstract

This paper presents the results of the classifiers we developed for the shared tasks in aggression identification and misogynistic aggression identification. These two shared tasks were held as part of the second workshop on Trolling, Aggression and Cyberbullying (TRAC). Both the subtasks were held for English, Hindi and Bangla language. In our study, we used English BERT (En-BERT), RoBERTa, DistilRoBERTa, and SVM based classifiers for English language. For Hindi and Bangla language, multilingual BERT (M-BERT), XLM-RoBERTa and SVM classifiers were used. Our best performing models are EN-BERT for English Subtask A (Weighted F1 score of 0.73, Rank 5/16), SVM for English Subtask B (Weighted F1 score of 0.87, Rank 2/15), SVM for Hindi Subtask A (Weighted F1 score of 0.79, Rank 2/10), XLMRoBERTa for Hindi Subtask B (Weighted F1 score of 0.87, Rank 2/10), SVM for Bangla Subtask A (Weighted F1 score of 0.81, Rank 2/10), and SVM for Bangla Subtask B (Weighted F1 score of 0.93, Rank 4/8). It is seen that the superior performance of the SVM classifier was achieved mainly because of its better prediction of the majority class. BERT based classifiers were found to predict the minority classes better.

Details

Paper ID
lrec2020-ws-trac-12
Pages
pp. 76-82
BibKey
baruah-etal-2020-aggression
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • AB

    Arup Baruah

  • KD

    Kaushik Das

  • FB

    Ferdous Barbhuiya

  • KD

    Kuntal Dey

Links