Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19249
Title: Adversarial Attacks on the Natural Language Inference task: Using Natural Language Explanations to Enhance Adversarial Robustness
Authors: Κουλάκος, Αλέξανδρος
Στάμου Γιώργος
Keywords: Natural Language Processing
Natural Language Inference
Natural Language Explanations
Adversarial Attacks
Adversarial Robustness
Transformers
Issue Date: 16-Jul-2024
Abstract: DNNs have achieved remarkable success in various Natural Language Processing tasks (e.g., text classification, summarization, machine translation, natural language inference). However, especially in the natural language inference task, it has been shown that state-of-the-art DNN-based models, trained on SNLI dataset, are susceptible to adversarial attacks, which aim to fool the model by adding imperceptible perturbations into legitimate inputs. Adversarial training has been proposed in order to address this issue, but it fails in masking out the SNLI dataset bias from the model's decision-making process. Based on the work of Camburu et al., we propose the modification of the traditional natural language inference task by incorporating natural language explanations during training and inference and we conduct a range of experiments in order to verify whether natural language explanations actually improve adversarial robustness. We use TextFooler and BERT-attack as attack recipes and the experimental results consistently show that incorporating natural language explanations in training and inference process enhances robustness against adversarial attacks.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19249
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
thesis.pdf3 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.