Back to PARLACLARIN 2024
LREC-COLING 2024workshop

Compiling and Exploring a Portuguese Parliamentary Corpus: ParlaMint-PT

Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

DOI:10.63317/4i5aqrnvksg9

Abstract

As part of the project ParlaMint II, a new corpus of the sessions of the Portuguese Parliament from 2015 to 2022 has been compiled, encoded and annotated following the ParlaMint guidelines. We report on the contents of the corpus and on the specific nature of the political settings in Portugal during the time period covered. Two subcorpora were designed that would enable comparisons of the political speeches between pre and post covid-19 pandemic. We discuss the pipeline applied to download the original texts, ensure their preprocessing and encoding in XML, and the final step of annotation. This new resource covers a period of changes in the political system in Portugal and will be an important source of data for political and social studies. Finally, Finally, we have explored the political stance on immigration in the ParlaMint-PT corpus.

Details

Paper ID
lrec2024-ws-parlaclarin-02
Pages
pp. 12-20
BibKey
aires-etal-2024-compiling
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • JA

    José Aires

  • AC

    Aida Cardoso

  • RP

    Rui Pereira

  • AM

    Amalia Mendes

Links