Attention-based Story Visualization

Τσάκας, Νικόλαος

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18287

Τίτλος:	Attention-based Story Visualization
Συγγραφείς:	Τσάκας, Νικόλαος Στάμου Γιώργος
Λέξεις κλειδιά:	GAN Story Visualization Transformer Attention
Ημερομηνία έκδοσης:	1-Μαρ-2022
Περίληψη:	Story Visualization is a novel task described as the generation of an image sequence based on a short story made up of natural language sentences or other semantic information. The task borrows from Text-to-Image in its pursuit of language-image correspondence, as well as Text-to-Video in its aim for consistency across frames. Currently there are few improvements on this challenging topic as well as a scarcity of viable datasets and evaluation methods. It is the combination of recent advances in sequence transduction (Transformer) and conditional image generation (SAGAN) that motivated our approach to the task of Story Visualization, in hopes of contributing towards a model that can capture the nuances of image sequence generation and language-to-vision temporal correspondence. The main objective of this thesis is to research various improvements on the original StoryGAN and experiment with different implementations of our architectural proposals. To that end we: • Examine the effects of using a Transformer encoder in place of the original RNN. • Apply more recent architectural approaches to the image generating GAN. • Explore the effects of attention mechanisms in the model, both as presented in the SAGAN architecture and by proposing two novel attention mechanisms for image sequences.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18287
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
tsakas_thesis.pdf		4.44 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε την πλήρη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.