Phrase-level Textual Adversarial Attack with Label Preservation

Yibin Lei; Yu Cao; Dianqi Li; Tianyi Zhou; Meng Fang; Mykola Pechenizkiy

Conference ProceedingsOPEN ACCESS

Phrase-level Textual Adversarial Attack with Label Preservation

Findings of the Association for Computational Linguistics: NAACL 2022 - Findings (2022) 1095-1112

DOI: 10.18653/v1/2022.findings-naacl.83

17Citations

34Readers

Abstract

Generating high-quality textual adversarial examples is critical for investigating the pitfalls of natural language processing (NLP) models and further promoting their robustness. Existing attacks are usually realized through word-level or sentence-level perturbations, which either limit the perturbation space or sacrifice fluency and textual quality, both affecting the attack effectiveness. In this paper, we propose Phrase- Level Textual Adversarial ATtack (PLAT) that generates adversarial samples through phraselevel perturbations. PLAT first extracts the vulnerable phrases as attack targets by a syntactic parser, and then perturbs them by a pre-trained blank-infilling model. Such flexible perturbation design substantially expands the search space for more effective attacks without introducing too many modifications, and meanwhile maintaining the textual fluency and grammaticality via contextualized generation using surrounding texts. Moreover, we develop a labelpreservation filter leveraging the likelihoods of language models fine-tuned on each class, rather than textual similarity, to rule out those perturbations that potentially alter the original class label for humans. Extensive experiments and human evaluation demonstrate that PLAT has a superior attack effectiveness as well as a better label consistency than strong baselines.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Lei, Y., Cao, Y., Li, D., Zhou, T., Fang, M., & Pechenizkiy, M. (2022). Phrase-level Textual Adversarial Attack with Label Preservation. In Findings of the Association for Computational Linguistics: NAACL 2022 - Findings (pp. 1095–1112). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-naacl.83

Readers' Seniority

PhD / Post grad / Masters / Doc 5

50%

Researcher 4

40%

Lecturer / Post doc 1

10%

Readers' Discipline

Computer Science 11

73%

Linguistics 2

13%

Neuroscience 1

Engineering 1

Phrase-level Textual Adversarial Attack with Label Preservation

Abstract

References Powered by Scopus

SQuad: 100,000+ questions for machine comprehension of text

A guided tour to approximate string matching

Adversarial examples for evaluating reading comprehension systems

Cited by Powered by Scopus

Exposing the Achilles’ heel of textual hate speech classifiers using indistinguishable adversarial examples

Towards Imperceptible Document Manipulations against Neural Ranking Models

A More Context-Aware Approach for Textual Adversarial Attacks Using Probability Difference-Guided Beam Search

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline