Back to Main Conference 2024
LREC-COLING 2024main

Unveiling Vulnerability of Self-Attention

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4wkcnuhtme2i

Abstract

Pre-trained language models (PLMs) are shown to be vulnerable to minor word changes, which poses a significant threat to real-world systems. While previous studies directly focus on manipulating word inputs, they are limited by their means of generating adversarial samples, lacking generalization to versatile real-world attacks. This paper studies the basic structure of transformer-based PLMs, the self-attention (SA) mechanism. (1) We propose a powerful perturbation technique named ‘HackAttend,’ which perturbs the attention scores within the SA matrices via meticulously crafted attention masks. We show that state-of-the-art PLMs fall into heavy vulnerability, with minor attention perturbations (1%) resulting in a very high attack success rate (98%). Our paper extends the conventional text attack of word perturbations to more general structural perturbations. (2) We introduce ‘S-Attend,’ a novel smoothing technique that effectively makes SA robust via structural perturbations. We empirically demonstrate that this simple yet effective technique achieves robust performance on par with adversarial training when facing various text attackers.

Details

Paper ID
lrec2024-main-1496
Pages
pp. 17225-17236
BibKey
liong-etal-2024-unveiling
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • KL

    Khai Jiet Liong

  • HW

    Hongqiu Wu

  • HZ

    Hai Zhao

Links