Towards the Morphological Annotation of North Markian (Low German)
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Low German (Low Saxon, ISO 639-2 nds) is an underresourced West Germanic language spoken in Northern Germany (Plattdütsch), in the Netherlands (Nedersaksisch) and in an international diaspora (Plautdietsch, Pomerano, etc.). As a minority language, it is under pressure from the respective national languages, and considered threatened. Although NLP and digital language resources might play a role in facilitating the use of the language on the web and to support intergenerational transmission, no NLP tools are known to exist, and no adequate corpora that such tools could be trained on. This paper describes the construction of a novel corpus of North Markian, a dialect of East Low German, its morphosyntactic annotation and morphological analysis, and in particular explores methods to bootstrap and develop such resources in the face of a complete lack of training data.