Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17598
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDaras, Giannis-
dc.date.accessioned2020-07-07T11:19:22Z-
dc.date.available2020-07-07T11:19:22Z-
dc.date.issued2020-06-30-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17598-
dc.description.abstractAttention mechanism is widely used in state-of-the-art neural networks for Natural Language Processing and Computer Vision. Despite its popularity, attention has some major drawbacks, the most important of which is that it requires quadratic memory and time complexity. In this work, we explore different ways to address this problem. We first propose to extend attention to multiple steps. At each step, each query attends to a subset of the original keys specified by a pre-defined sparsity pattern. We introduce a novel theoretical framework for designing meaningful multiple steps attention models using Information Flow Graphs. Under this framework, we show that attention can be performed even in linear time when the connections between multiple sequential attention layers form a Superconcentrator graph. Specifically for images, we propose a new local sparse attention layer with O(n \sqrt n) that preserves two-dimensional geometry and locality. We show that by just replacing the dense attention layer of SAGAN with our construction, we obtain very significant FID, Inception score and pure visual improvements. FID score is improved from 18.65 to 15.94 on ImageNet, keeping all other parameters the same. We also observe that until now the practical usefulness of the intrinsic probabilistic distribution computed in attention layers has been unexplored. We demonstrate that using this distribution we can effectively solve a wide variety of hard problems, such as inversion of large GANs. Finally, we review alternative ways of lowering the computational complexity of dense attention that are based on dynamic sparsity. We underline the limitations of the proposed approaches and we discuss potential ways to address them.en_US
dc.languageenen_US
dc.subjectattentionen_US
dc.subjectmulti-step attentionen_US
dc.subjectGANen_US
dc.subjectlocality sensitive hashingen_US
dc.subjectsparsityen_US
dc.subjectexpanderen_US
dc.subjectsuperconcentratoren_US
dc.subjectmachine learningen_US
dc.subjectdeep learningen_US
dc.titleDesigning Efficient Attention Mechanisms for Deep Neural Networksen_US
dc.description.pages148en_US
dc.contributor.supervisorΠοταμιάνος Αλέξανδροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
thesis.pdf7.97 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.