Proceedings of the 12th Workshop on Challenges in the Management of Large Corpora
LREC 2026 Workshop
TestiMole-Conversational: A 30-Billion-Word Italian Discussion Board Corpus (1996–2024) for Language Modeling and Sociolinguistic Research
Matteo Rinaldi, Rossella Varvara, Viviana Patti
A Large Dataset Representing Bulgarian, with the Bulgarian National Corpus as Its Core
Svetla Peneva Koeva, Ivelina Stoyanova
Merimënga: A Manifest-First Pipeline for Reproducible Albanian Web Corpus Construction
Besim Kabashi, Michael Ruppert
Pop Lyrics through Time: Challenges in Corpus-Based Modeling of Linguistic and Emotional Dynamics in German Pop Lyrics
Roman Schneider
The Infrastructure behind Latvian National Corpora Collection
Roberts Dargis, Baiba Valkovska
Optimized for AI: Curating the Icelandic Gigaword Corpus for Stable LLM Training
Jón Friðrik Daðason, Steinþór Steingrímsson
Hellenic National Corpus: The Current State
Maria Gavriilidou, Nikolaos Sidiropoulos
Corpas Náisiúnta Na Gaeilge 2022-2029: A Project Overview
Mícheál J. Ó Meachair, Úna Bhreathnach, Kevin Scannell, Michal Mechura, Brian Ó Raghallaigh, Gearóid Ó Cleircín
General Regionally Annotated Corpus of Ukrainian: Recent Developments and Future Plans
Maria Shvedova
Recent Developments of the Bulgarian National Corpus
Svetla Peneva Koeva, Ivelina Stoyanova
The British National Corpus 1994 to 2026
Martin Wynne, Megan Bushnell
The Corpus of Contemporary Polish: 2011-2020 Decade and Beyond
Witold Kieraś, Małgorzata Marciniak, Katarzyna Krasnowska-Kieraś, Marcin Woliński
Building the v4 of the Croatian National Corpus
Marko Tadić, Vanja Štefanec, Daša Farkaš
Managing Growth in a National Corpus: The Hungarian National Corpus 3.0 (MNSZ3)
Noémi Ligeti-Nagy, Enikő Héja, Ágnes Bánfi, Flóra Földesi, Bence Sárossy, Boglárka Skrabák, Tamás Váradi, Gábor Prószéky
CoRoLa Version 2.0: Corpus Enrichment and a New Annotation Level
Elena Irimia, Verginica Barbu Mititelu, Radu Ion, Vasile Pais, Maria Mitrofan, Dan Ioan Tufis
The German Medical Text Corpus: Early 2026 Update
Justin Hofenbitzer, Christina Lohr, Frank Meineke, Markus Löffler, Martin Boeker
From Corpus to Community: New NLP Tools for Welsh Language Research and Learning
Dawn Knight, Fernando Alva-Manchego
Swiss-AL: Language Data Platform for Applied Sciences
Julia Krasselt, Philipp Dreesen, Dolores Lemmenmeier-Batinić, Sooyeon Geckeler, Klaus Rothenhäusler, Matthias Fluor
EuReCo, KorAP and DeReKo: Updates on Ingestion and Annotation Pipelines, Backend, Interfaces, Operation, and Corpora
Marc Kupietz, Nils Diewald, Harald Lüngen, Eliza Margaretha Illig, Helge Stallkamp, Uyen-Nhu Tran, Rameela Yaddehige
Showing all 19 papers