HomeLREC 2026WorkshopsPARLACLARINlrec2026-ws-parlaclarin-02
Back to PARLACLARIN 2026
LREC 2026workshop

Quantifying Code-Switching in a Ukrainian Parliamentary Dataset 1990-2021

Proceedings of the ParlaCLARIN V Workshop on Interoperability, Multilinguality, and Multimodality in Parliamentary Corpora

DOI:10.63317/42ss3zvt76a7

Abstract

Analyzing code-switching – the practice of mixing multiple languages in one discourse – remains a significant task in natural language processing (NLP). This study examines the Ukrainian-Russian bilingual context, focusing on quantifying language alternation in a multilingual dataset. We introduce metrics to assess linguistic boundaries and patterns, specifically addressing the complexities of processing texts where Ukrainian and Russian are used interchangeably, including word-level hybridization. Using a corpus of approximately 200,000 tokens derived from parliamentary transcripts (1990-2021), we apply code-switching metrics to identify frequency and patterns of language use. Our findings provide insights into bilingual communication dynamics and can be used to improve language identification models for mixed-language data.

Details

Paper ID
lrec2026-ws-parlaclarin-02
Pages
pp. 2-12
BibKey
kanishcheva-etal-2026-quantifying
Editors
Maria Eskevich, Vincent Vandeghinste, David Bodron
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the ParlaCLARIN V Workshop on Interoperability, Multilinguality, and Multimodality in Parliamentary Corpora
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • OK

    Olha Kanishcheva

  • MS

    Maria Shvedova

Links