Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17598
 Title: Designing Efficient Attention Mechanisms for Deep Neural Networks Authors: Daras, GiannisΠοταμιάνος Αλέξανδρος Keywords: attentionmulti-step attentionGANlocality sensitive hashingsparsityexpandersuperconcentratormachine learningdeep learning Issue Date: 30-Jun-2020 Abstract: Attention mechanism is widely used in state-of-the-art neural networks for Natural Language Processing and Computer Vision. Despite its popularity, attention has some major drawbacks, the most important of which is that it requires quadratic memory and time complexity. In this work, we explore different ways to address this problem. We first propose to extend attention to multiple steps. At each step, each query attends to a subset of the original keys specified by a pre-defined sparsity pattern. We introduce a novel theoretical framework for designing meaningful multiple steps attention models using Information Flow Graphs. Under this framework, we show that attention can be performed even in linear time when the connections between multiple sequential attention layers form a Superconcentrator graph. Specifically for images, we propose a new local sparse attention layer with O(n \sqrt n) that preserves two-dimensional geometry and locality. We show that by just replacing the dense attention layer of SAGAN with our construction, we obtain very significant FID, Inception score and pure visual improvements. FID score is improved from 18.65 to 15.94 on ImageNet, keeping all other parameters the same. We also observe that until now the practical usefulness of the intrinsic probabilistic distribution computed in attention layers has been unexplored. We demonstrate that using this distribution we can effectively solve a wide variety of hard problems, such as inversion of large GANs. Finally, we review alternative ways of lowering the computational complexity of dense attention that are based on dynamic sparsity. We underline the limitations of the proposed approaches and we discuss potential ways to address them. URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17598 Appears in Collections: Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat