Back to Main Conference 2026
LREC 2026main

Reformulate and Create, Don't Translate: Creating Natural Prompts for Underserved Languages

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4kz5jrk3j9kw

Abstract

We present a methodology for creating high-quality instruction prompts for low-resource Germanic languages that addresses a critical challenge: small annotator pools risk producing datasets reflecting narrow individual interests rather than diverse user needs. In this work, native speakers reformulate existing English prompts from OpenAssistant or create entirely original prompts, adapting them to reflect local contexts and natural language patterns while preserving broad task and topic diversity. This approach produced high-quality prompt datasets totaling 6,950 prompts across seven Germanic languages (German, Dutch, Swedish, Norwegian Bokmål/Nynorsk, Danish, Icelandic and Faroese) with validated coverage of diverse tasks and topics. Blind evaluation demonstrates that human-reformulated prompts significantly outperform synthetically generated prompts in naturalness and comprehensibility, particularly for low-resource languages like Icelandic and Faroese. For the bigger Scandinavian lan- guage, Danish, the difference was less pronounced. The prompt dataset is released under an open-source license at https://huggingface.co/datasets/AnnikaSimonsen/TrustLLM-reformulation-prompts.

Details

Paper ID
lrec2026-main-841
Pages
pp. 10735-10749
BibKey
simonsen-etal-2026-reformulate
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AS

    Annika Simonsen

  • MS

    Mathias Stenlund

  • LB

    Lars Bungum

  • MV

    Marc Daníel Skipstað Volhardt

  • HE

    Hafsteinn Einarsson

Links