Phrase-level Textual Adversarial Attack with Label Preservation

17Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.

Abstract

Generating high-quality textual adversarial examples is critical for investigating the pitfalls of natural language processing (NLP) models and further promoting their robustness. Existing attacks are usually realized through word-level or sentence-level perturbations, which either limit the perturbation space or sacrifice fluency and textual quality, both affecting the attack effectiveness. In this paper, we propose Phrase- Level Textual Adversarial ATtack (PLAT) that generates adversarial samples through phraselevel perturbations. PLAT first extracts the vulnerable phrases as attack targets by a syntactic parser, and then perturbs them by a pre-trained blank-infilling model. Such flexible perturbation design substantially expands the search space for more effective attacks without introducing too many modifications, and meanwhile maintaining the textual fluency and grammaticality via contextualized generation using surrounding texts. Moreover, we develop a labelpreservation filter leveraging the likelihoods of language models fine-tuned on each class, rather than textual similarity, to rule out those perturbations that potentially alter the original class label for humans. Extensive experiments and human evaluation demonstrate that PLAT has a superior attack effectiveness as well as a better label consistency than strong baselines.

References Powered by Scopus

SQuad: 100,000+ questions for machine comprehension of text

3974Citations
N/AReaders
Get full text

A guided tour to approximate string matching

1929Citations
N/AReaders
Get full text

Adversarial examples for evaluating reading comprehension systems

894Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Exposing the Achilles’ heel of textual hate speech classifiers using indistinguishable adversarial examples

7Citations
N/AReaders
Get full text

Towards Imperceptible Document Manipulations against Neural Ranking Models

5Citations
N/AReaders
Get full text

A More Context-Aware Approach for Textual Adversarial Attacks Using Probability Difference-Guided Beam Search

2Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Lei, Y., Cao, Y., Li, D., Zhou, T., Fang, M., & Pechenizkiy, M. (2022). Phrase-level Textual Adversarial Attack with Label Preservation. In Findings of the Association for Computational Linguistics: NAACL 2022 - Findings (pp. 1095–1112). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-naacl.83

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 5

50%

Researcher 4

40%

Lecturer / Post doc 1

10%

Readers' Discipline

Tooltip

Computer Science 11

73%

Linguistics 2

13%

Neuroscience 1

7%

Engineering 1

7%

Save time finding and organizing research with Mendeley

Sign up for free