Back to Main Conference 2026
LREC 2026main

GhostWriter: Hidden AI-Generated Texts over Multiple Languages, Domains and Generators

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/57fd7juh5zek

Abstract

The advent of Transformer-based Large Language Models (LLMs) has led to an unprecedented surge of AI-generated text (AIGT) across online platforms and academic domains. While these models exhibit near-human fluency and stylistic coherence, their widespread adoption has raised concerns about authorship integrity, research quality, and the recursive contamination of training corpora with synthetic data. These developments underscore the need for reliable AIGT detection methods and benchmark datasets, particularly for malicious or deceptive *ghostwriting* scenarios where AIGT is intentionally crafted to evade detection. To address this, we present **GhostWriter**, a large-scale, bilingual (German and English), multi-generator, and multi-domain dataset for AIGT detection. The dataset comprises human- and AI-authored texts produced under domain-specific *ghostwriting* conditions, including examples intentionally embedded within otherwise human-written texts to obscure their AI origin. With **GhostWriter**, we (i) aim to expand the resources available for German AIGT datasets, (ii) emphasize mixed or fused synthesizations—since most existing corpora are limited to the document level—and (iii) introduce specifically crafted malicious ghostwriting scenarios across multiple domains and generators.

Details

Paper ID
lrec2026-main-823
Pages
pp. 10497-10516
BibKey
schaaf-etal-2026-ghostwriter
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • MS

    Manuel Schaaf

  • KB

    Kevin Bönisch

  • AM

    Alexander Mehler

Links