New Encoders for German Trained from Scratch: Comparing ModernGBERT with Converted LLM2Vec Models
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Encoders remain essential for efficient German NLP and NLU scenarios despite the rise of decoder-only LLMs. This work studies two routes to high-quality German encoders under identical data and training constraints: a) training from scratch and b) converting decoders via LLMVec. We introduce two resources: ModernGBERT (134M, 1B), fully transparent German encoders in the ModernBERT style, and LLäMmleinVec (120M, 1B, 7B), decoder-to-encoder conversions trained with masked next-token prediction, both undergoing a context extension to 8192 tokens. Across SuperGLEBer, ModernGBERT 1B sets a new state of the art (avg 0.808), surpassing GBERTlarge (+4%) and the seven-times larger converted 7B model (0.787). On German MTEB after supervised fine-tuning, ModernGBERT 1B (0.551) approaches the converted 7B model (0.557). We release all models, checkpoints, datasets, and full training records, and introduce an encoder-adapted QA-NIAH evaluation. All in all, our results provide actionable guidance: when parameter efficiency and latency matter, from-scratch encoders dominate. When a pre-trained decoder exists and compute is a limited, conversion offers an effective alternative.