HomeLREC 2026WorkshopsSLIDElrec2026-ws-slide-20
Back to SLIDE 2026
LREC 2026workshop

Semantic, Syntactic, Lexical: What Makes QA Augmentation Work in Limited Quantity?

Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)

DOI:10.63317/2545w4h6dty8

Abstract

Data augmentation is a common fix in domains where training data is scarce or difficult to collect, such as specialized medical or any other domain specific applications. In question answering (QA), most studies report headline accuracy while saying little about the quality of the synthetic data. Here, quality goes beyond fluent rewording: augmented items must remain faithful to the supporting evidence and preserve the original answerability. We study three augmentation families lexical, syntactic, and semantic edits generated with LLaMA 3.1 70B, and analyze how these edits affect model behavior. To mirror low-resource settings, we focus on subsets of SQuADv2 (general) and PubMedQA (biomedical, domain specific). We report Exact Match (EM)/F1 alongside quality diagnostics, yielding a fuller picture than accuracy alone. Our results show that augmentation behaves differently across domains and scales. In SQuADv2, augmented variants maintain performance on par with baselines, showing that added diversity mostly does not harm model quality, whereas in PubMedQA semantic edits bring improvements under extreme scarcity and support stronger performance as supervision grows.

Details

Paper ID
lrec2026-ws-slide-20
Pages
pp. 224-236
BibKey
rachmat-etal-2026-semantic
Editors
Germany) Erhard Hinrichs (Tübingen University, Sweden) Joakim Nivre (Uppsala University, Bulgaria) Petya Osenova (Sofia University, USA) James Pustejovsky (Brandeis University, Germany) Claus Zinn (Tübingen University
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • BR

    Benedictus Kent Rachmat

  • TG

    Thomas Gerald

  • TN

    Takuya Nakamura

  • ZZ

    Zheng Zhang

  • CG

    Cyril Grouin

Links