HomeLREC 2026WorkshopsDTFlrec2026-ws-dtf-07
Back to DTF 2026
LREC 2026workshop

DUO_DE A1: An Annotated Corpus of Online Learning Material for Beginning Learners of German as a Foreign Language

Proceedings of Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science (DTF) @ LREC 2026

DOI:10.63317/5mo2pqo4dkpa

Abstract

This paper describes the creation of DUO_DE A1, a corpus based on A1-level learning material from the Deutsch-Uni Online (DUO) language courses for German as a foreign language. We split the material into small segments and manually annotated each with fine-grained information such as the type of segment (e.g. task description, description of grammar), the medium (e.g. text, table, audio), the text units it contains (e.g. words, phrases, sentences) and other special features (e.g. marking cloze texts). Furthermore, we automatically tokenized, POS tagged and lemmatized the corpus and compared the performance of three models on these steps for different kinds of segments. We publish the created corpus in a manner that respects copyright, releasing all structural features, metadata and POS tags.

Details

Paper ID
lrec2026-ws-dtf-07
Pages
pp. 51-62
BibKey
laguidi-etal-2026-duo_de
Editors
Florian Barth, Keli Du, José Calvo Tello, Philippe Genêt, Piroska Lendvai, Christof Schöch, Thorsten Trippel
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science (DTF) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • JL

    Jammila Laâguidi

  • VR

    Vitaliia Ruban

  • RL

    Ronja Laarmann-Quante

  • AD

    Anastasia Drackert

Links