Back to Main Conference 2024
LREC-COLING 2024main

What Do Transformers Know about Government?

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4octbf2xm2im

Abstract

This paper investigates what insights about linguistic features and what knowledge about the structure of natural language can be obtained from the encodings in transformer language models. In particular, we explore how BERT encodes the government relation between constituents in a sentence. We use several probing classifiers, and data from two morphologically rich languages. Our experiments show that information about government is encoded across all transformer layers, but predominantly in the early layers of the model. We find that, for both languages, a small number of attention heads encode enough information about the government relations to enable us to train a classifier capable of discovering new, previously unknown types of government, never seen in the training data. Currently, data is lacking for the research community working on grammatical constructions, and government in particular. We release the Government Bank—a dataset defining the government relations for thousands of lemmas in the languages in our experiments.

Details

Paper ID
lrec2024-main-1518
Pages
pp. 17459-17472
BibKey
hou-etal-2024-transformers
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • JH

    Jue Hou

  • AK

    Anisia Katinskaia

  • LK

    Lari Kotilainen

  • ST

    Sathianpong Trangcasanchai

  • AV

    Anh-Duc Vu

  • RY

    Roman Yangarber

Links