The Texas German Dialect Project Corpus as a Diachronic Resource for Investigating Language Contact
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Abstract
The Texas German Dialect Project (TGDP) is a long-standing effort to document the unique variety of German spoken in Texas since the 1840s. For 25 years, the TGDP has built up the freely accessible Texas German Dialect Archive Online (TGDA Online) with recordings and annotations of interviews and language tasks conducted between 2001 and today with some of the last speakers of the variety, which is expected to go extinct within the next 5-10 years. The present paper reports on the most recent addition of to the TGDP’s online corpus platform— a collection of Texas German data recorded in the 1960s—as well as historical translation elicitations that will be released later in 2026. Both the contemporary and the historical materials follow very similar elicitation methods and are processed using the same pipelines, increasing their comparability. This provides a comparable historical dimension to the resource, enabling diachronic and multidimensional analyses of this endangered variety. These data can help shed light on the dynamics of language contact, dialect contact, and language death.