Back to ECNLP 2024
LREC-COLING 2024workshop

STA: Self-controlled Text Augmentation for Improving Text Classifications

Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024

DOI:10.63317/35gxuynzeefc

Abstract

Despite recent advancements in Machine Learning, many tasks still involve working in low-data regimes which can make solving natural language problems difficult. Recently, a number of text augmentation techniques have emerged in the field of Natural Language Processing (NLP) which can enrich the training data with new examples, though they are not without their caveats. For instance, simple rule-based heuristic methods are effective, but lack variation in semantic content and syntactic structure with respect to the original text. On the other hand, more complex deep learning approaches can cause extreme shifts in the intrinsic meaning of the text and introduce unwanted noise into the training data. To more reliably control the quality of the augmented examples, we introduce a state-of-the-art approach for Self-Controlled Text Augmentation (STA). Our approach tightly controls the generation process by introducing a self-checking procedure to ensure that generated examples retain the semantic content of the original text. Experimental results on multiple benchmarking datasets demonstrate that STA substantially outperforms existing state-of-the-art techniques, whilst qualitative analysis reveals that the generated examples are both lexically diverse and semantically reliable.

Details

Paper ID
lrec2024-ws-ecnlp-11
Pages
pp. 97-114
BibKey
wang-etal-2024-sta
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • CW

    Congcong Wang

  • GF

    Gonzalo Fiz Pontiveros

  • SD

    Steven Derby

  • TK

    Tri Kurniawan Wijaya

Links