Linguistic Counterfactuals for Visual Question Answering

Στόικου, Θεοδότη

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18722

Full metadata record

DC Field	Value	Language
dc.contributor.author	Στόικου, Θεοδότη	-
dc.date.accessioned	2023-07-10T10:52:30Z	-
dc.date.available	2023-07-10T10:52:30Z	-
dc.date.issued	2023-07-07	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18722	-
dc.description.abstract	Visual Question Answering (VQA) has been a popular task that combines vision and language, with numerous relevant implementations in literature. Even though there are some attempts that approach explainability and robustness issues in VQA models, very few of them employ counterfactuals as a means of probing such challenges in a model-agnostic way. In this diploma thesis, we propose a systematic method for explaining the behavior and investigating the robustness of VQA models through counterfactual perturbations. For this reason, we exploit structured knowledge bases to perform deterministic, optimal and controllable word-level replacements targeting the linguistic modality, and we then evaluate the model’s response against such counterfactual inputs. Finally, we qualitatively extract local and global explanations based on counterfactual responses, which are ultimately proven insightful in interpreting VQA model behaviors. By performing a variety of perturbation types, targeting different parts of speech of the input question, we gain insights into the reasoning of the model, through the comparison of its responses in different adversarial circumstances. Overall, we reveal possible biases in the decision-making process of the model, as well as expected and unexpected patterns, which impact its performance quantitatively and qualitatively, as indicated by our analysis.	en_US
dc.language	en	en_US
dc.subject	Explainable AI	en_US
dc.subject	Visual Question Answering	en_US
dc.title	Linguistic Counterfactuals for Visual Question Answering	en_US
dc.description.pages	101	en_US
dc.contributor.supervisor	Στάμου Γιώργος	en_US
dc.department	Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	en_US
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
Linguistic Counterfactuals for Visual Question Answering.pdf		18.33 MB	Adobe PDF	View/Open

Show simple item record