Designing Efficient Attention Mechanisms for Deep Neural Networks

Daras, Giannis

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17598

Full metadata record

DC Field	Value	Language
dc.contributor.author	Daras, Giannis	-
dc.date.accessioned	2020-07-07T11:19:22Z	-
dc.date.available	2020-07-07T11:19:22Z	-
dc.date.issued	2020-06-30	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17598	-
dc.description.abstract	Attention mechanism is widely used in state-of-the-art neural networks for Natural Language Processing and Computer Vision. Despite its popularity, attention has some major drawbacks, the most important of which is that it requires quadratic memory and time complexity. In this work, we explore different ways to address this problem. We first propose to extend attention to multiple steps. At each step, each query attends to a subset of the original keys specified by a pre-defined sparsity pattern. We introduce a novel theoretical framework for designing meaningful multiple steps attention models using Information Flow Graphs. Under this framework, we show that attention can be performed even in linear time when the connections between multiple sequential attention layers form a Superconcentrator graph. Specifically for images, we propose a new local sparse attention layer with O(n \sqrt n) that preserves two-dimensional geometry and locality. We show that by just replacing the dense attention layer of SAGAN with our construction, we obtain very significant FID, Inception score and pure visual improvements. FID score is improved from 18.65 to 15.94 on ImageNet, keeping all other parameters the same. We also observe that until now the practical usefulness of the intrinsic probabilistic distribution computed in attention layers has been unexplored. We demonstrate that using this distribution we can effectively solve a wide variety of hard problems, such as inversion of large GANs. Finally, we review alternative ways of lowering the computational complexity of dense attention that are based on dynamic sparsity. We underline the limitations of the proposed approaches and we discuss potential ways to address them.	en_US
dc.language	en	en_US
dc.subject	attention	en_US
dc.subject	multi-step attention	en_US
dc.subject	GAN	en_US
dc.subject	locality sensitive hashing	en_US
dc.subject	sparsity	en_US
dc.subject	expander	en_US
dc.subject	superconcentrator	en_US
dc.subject	machine learning	en_US
dc.subject	deep learning	en_US
dc.title	Designing Efficient Attention Mechanisms for Deep Neural Networks	en_US
dc.description.pages	148	en_US
dc.contributor.supervisor	Ποταμιάνος Αλέξανδρος	en_US
dc.department	Τομέας Σημάτων, Ελέγχου και Ρομποτικής	en_US
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
thesis.pdf		7.97 MB	Adobe PDF	View/Open

Show simple item record