Reformulate and Create, Don't Translate: Creating Natural Prompts for Underserved Languages
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We present a methodology for creating high-quality instruction prompts for low-resource Germanic languages that addresses a critical challenge: small annotator pools risk producing datasets reflecting narrow individual interests rather than diverse user needs. In this work, native speakers reformulate existing English prompts from OpenAssistant or create entirely original prompts, adapting them to reflect local contexts and natural language patterns while preserving broad task and topic diversity. This approach produced high-quality prompt datasets totaling 6,950 prompts across seven Germanic languages (German, Dutch, Swedish, Norwegian Bokmål/Nynorsk, Danish, Icelandic and Faroese) with validated coverage of diverse tasks and topics. Blind evaluation demonstrates that human-reformulated prompts significantly outperform synthetically generated prompts in naturalness and comprehensibility, particularly for low-resource languages like Icelandic and Faroese. For the bigger Scandinavian lan- guage, Danish, the difference was less pronounced. The prompt dataset is released under an open-source license at https://huggingface.co/datasets/AnnikaSimonsen/TrustLLM-reformulation-prompts.