Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19726
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΤσέλιγκας, Γεώργιος-Στυλιανός-
dc.date.accessioned2025-07-15T10:07:11Z-
dc.date.available2025-07-15T10:07:11Z-
dc.date.issued2025-07-01-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19726-
dc.description.abstractContemporary neural networks have achieved state of the art performance in vision and language tasks by growing in scale. Their enlarging scale combined with their static, one size fits all inference, has lead to a line of research on dynamic neural networks, that can adapt the size and/or structure of their computation, on a per sample basis. This way, full network compute can be allocated on hard, non paradigmatic examples, while easy, paradigmatic ones can utilize less network resources. One of the most natural ways to make a network dynamic, is by implementing early exit methods, which accelerate inference by performing a performance-speedup tradeoff. We apply the idea of early exiting and build on top of previous works on Auto Compres- sor Networks (ACNs). ACNs remove residual connections and replace them with so called long connections, that directly connect each intermediate layer, to the network output. This direct connectivity allows ACNs to compress information on the earlier layers, and makes them good candidates for early exiting methodologies. In this thesis we implement a variety of early exit techniques on ACNs. We try ap- proaches based on intermediate layer logits, intermediate layer embedding distances and on trainable early exit decision heads, and evaluate them on image and language tasks. We achieve great inference speedups, with minimal (if any) performance degradation, compared to full network performance. We compare our early exit results on BERT, with popular techniques from the literature, and showcase the ability of early exit on ACNs to achieve a much better performance-speedup tradeoff. Specifically, our methods achieve speedups of 3-4x, in contrast to 1.5-2x found in the literature, and performance is com- parable or better.en_US
dc.languageenen_US
dc.subjectDynamic Networksen_US
dc.subjectEarly Exiten_US
dc.subjectAuto Compressor Networksen_US
dc.titleEarly Exit techniques for Auto Compressing Neural Networksen_US
dc.description.pages97en_US
dc.contributor.supervisorΠοταμιάνος Αλέξανδροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
Early_exit_and_speculative_decoding_techniques_for_Auto_Compressing_Neural_Networks-5.pdf2.38 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.