HomeLREC 2026WorkshopsBUCClrec2026-ws-bucc-06
Back to BUCC 2026
LREC 2026workshop

Liebe Kolleg:innen, Querid@s Compañer@s: Presenting the GILDEES Corpus

Proceedings of the 19th Workshop on Building and Using Comparable Corpora (BUCC)

DOI:10.63317/4tbr93ap6scq

Abstract

We present a multi-register (web, news, and government texts), diachronic (2015-2024), comparable corpus annotated for lexical gender-inclusive language (gil) features in German and Spanish. Apart from rule-based annotations, we train a transformer-based classifier to resolve semantically ambiguous neutral expressions like epicenes to reliably annotate true human referents. In a sample study, we analyze register variation in the three registers in terms of gil features both contrastively and diachronically. We show that gil usage increases and varies diachronically in terms of register in both languages. German texts show a higher overall frequency and diversity of gil features than Spanish texts. However, across languages, registers behave similarly, with government text showing the strongest usage of gil followed by news and web texts, and web texts showing the strongest innovation in terms of features. The results of our study are valuable to linguistic areas such as human and machine translation, SLA, and contribute to register-conform gender inclusive NLP downstream tasks such as machine translation, summarization or textgeneration. From a diachronic point of view, our corpus and analyses are a valuable contribution to observing language change in the making.

Details

Paper ID
lrec2026-ws-bucc-06
Pages
pp. 41-52
BibKey
krielke-2026-liebe
Editors
Reinhard Rapp, Ayla Rigouts Terryn, Serge Sharoff, Pierre Zweigenbaum
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 19th Workshop on Building and Using Comparable Corpora (BUCC)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • MK

    Marie-Pauline Krielke

Links