Back to FINNLP 2024
LREC-COLING 2024workshop

BBRC: Brazilian Banking Regulation Corpora

Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing

DOI:10.63317/5iebpurw6p5r

Abstract

We present BBRC, a collection of 25 corpus of banking regulatory risk from different departments of Banco do Brasil (BB). These are individual corpus about investments, insurance, human resources, security, technology, treasury, loans, accounting, fraud, credit cards, payment methods, agribusiness, risks, etc. They were annotated in binary form by experts indicating whether each regulatory document contains regulatory risk that may require changes to products, processes, services, and channels of a bank department or not. The corpora in Portuguese contain documents from 26 Brazilian regulatory authorities in the financial sector. In total, there are 61,650 annotated documents, mostly between half and three pages long. The corpora belong to a Natural Language Processing (NLP) application that has been in production since 2020. In this work, we also performed binary classification benchmarks with some of the corpus. Experiments were carried out with different sampling techniques and in one of them we sought to solve an intraclass imbalance problem present in each corpus of the corpora. For the benchmarks, we used the following classifiers: Multinomial Naive Bayes, Random Forest, SVM, XGBoost, and BERTimbau (a version of BERT for Portuguese). The BBRC can be downloaded through a link in the article.

Details

Paper ID
lrec2024-ws-finnlp-15
Pages
pp. 150-166
BibKey
faria-de-azevedo-etal-2024-bbrc
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • RF

    Rafael Faria de Azevedo

  • TE

    Thiago Henrique Eduardo Muniz

  • CP

    Claudio Pimentel

  • GJ

    Guilherme Jose de Assis Foureaux

  • BC

    Barbara Caldeira Macedo

  • DV

    Daniel de Lima Vasconcelos

Links