Cognitive methods for image captioning

Σωτηρίου, Δημήτριος

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18056

Full metadata record

DC Field	Value	Language
dc.contributor.author	Σωτηρίου, Δημήτριος	-
dc.date.accessioned	2021-07-28T18:54:10Z	-
dc.date.available	2021-07-28T18:54:10Z	-
dc.date.issued	2021-07-26	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18056	-
dc.description.abstract	Even though image captioning is a difficult task for computers, humans can easily describe images through inherent capabilities of their brains with little effort. Recent research has shown that brain activations encode semantic information about what people see and think. In the domain of neuroscience, several studies have attempted to extract this information from brain activations. In this work, we propose several techniques of incorporating fMRI brain activations to an image captioning model that is based on the transformer encoder-decoder architecture. Specifically, we consider fusion at the encoder, attention conditioning on the decoder and other techniques with a separate transformer encoder for the brain activations. In addition, more adaptive variants of the aforementioned fusion techniques are explored in order to enforce the usage of the weak modality of brain activations or to enable the usage of the brain activations only when they are likely to contribute significant information to the model. Due to the fact that fMRI data are limited, a “lexical expansion” step is performed in various different ways, where brain activations are predicted for novel visual stimuli, that were not used in the fMRI experiment. Our results indicate that the quality of the “lexical expansion” is not guaranteed by the main evaluation process proposed in the literature, as other evaluation procedures indicate that this mapping is not very robust, potentially introducing additional noise to the predicted activations. Therefore, the scope for improvement of the model via brain activations seems to be quite limited and only minor deviations from the baseline are observed in all our experiments, suggesting that the model fails to extract meaningful information from the weak modality of brain activations. Finally, we conclude that additional research is needed in order to establish the usefulness of brain activations in complex computational tasks such as image captioning.	en_US
dc.language	en	en_US
dc.subject	μηχανική μάθηση	en_US
dc.subject	machine learning	en_US
dc.subject	βαθιά μάθηση	en_US
dc.subject	deep learning	en_US
dc.subject	νευρωνικά δίκτυα	en_US
dc.subject	neural networks	en_US
dc.subject	μετασχηματιστές	en_US
dc.subject	transformers	en_US
dc.subject	γνωσιακή νευροεπιστήμη	en_US
dc.subject	cognitive neuroscience	en_US
dc.subject	λειτουργική μαγνητική τομογραφία	en_US
dc.subject	functional MRI	en_US
dc.subject	δημιουργία λεζάντας εικόνας	en_US
dc.subject	image captioning	en_US
dc.title	Cognitive methods for image captioning	en_US
dc.description.pages	114	en_US
dc.contributor.supervisor	Ποταμιάνος Αλέξανδρος	en_US
dc.department	Τομέας Σημάτων, Ελέγχου και Ρομποτικής	en_US
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
Cognitive_methods_for_image_captioning.pdf		1.72 MB	Adobe PDF	View/Open

Show simple item record