Semantic, Syntactic, Lexical: What Makes QA Augmentation Work in Limited Quantity?

Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)

Abstract

Data augmentation is a common fix in domains where training data is scarce or difficult to collect, such as specialized medical or any other domain specific applications. In question answering (QA), most studies report headline accuracy while saying little about the quality of the synthetic data. Here, quality goes beyond fluent rewording: augmented items must remain faithful to the supporting evidence and preserve the original answerability. We study three augmentation families lexical, syntactic, and semantic edits generated with LLaMA 3.1 70B, and analyze how these edits affect model behavior. To mirror low-resource settings, we focus on subsets of SQuADv2 (general) and PubMedQA (biomedical, domain specific). We report Exact Match (EM)/F1 alongside quality diagnostics, yielding a fuller picture than accuracy alone. Our results show that augmentation behaves differently across domains and scales. In SQuADv2, augmented variants maintain performance on par with baselines, showing that added diversity mostly does not harm model quality, whereas in PubMedQA semantic edits bring improvements under extreme scarcity and support stronger performance as supervision grows.

Resources

Details

Paper ID

lrec2026-ws-slide-20

Pages

pp. 224-236

DOI

10.63317/2545w4h6dty8

BibKey

rachmat-etal-2026-semantic

Editors

Germany) Erhard Hinrichs (Tübingen University, Sweden) Joakim Nivre (Uppsala University, Bulgaria) Petya Osenova (Sofia University, USA) James Pustejovsky (Brandeis University, Germany) Claus Zinn (Tübingen University

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

BR
Benedictus Kent Rachmat
TG
Thomas Gerald
TN
Takuya Nakamura
ZZ
Zheng Zhang
CG
Cyril Grouin

Links

URL

DOI