Early Exit techniques for Auto Compressing Neural Networks

Τσέλιγκας, Γεώργιος-Στυλιανός

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19726

Full metadata record

DC Field	Value	Language
dc.contributor.author	Τσέλιγκας, Γεώργιος-Στυλιανός	-
dc.date.accessioned	2025-07-15T10:07:11Z	-
dc.date.available	2025-07-15T10:07:11Z	-
dc.date.issued	2025-07-01	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19726	-
dc.description.abstract	Contemporary neural networks have achieved state of the art performance in vision and language tasks by growing in scale. Their enlarging scale combined with their static, one size fits all inference, has lead to a line of research on dynamic neural networks, that can adapt the size and/or structure of their computation, on a per sample basis. This way, full network compute can be allocated on hard, non paradigmatic examples, while easy, paradigmatic ones can utilize less network resources. One of the most natural ways to make a network dynamic, is by implementing early exit methods, which accelerate inference by performing a performance-speedup tradeoff. We apply the idea of early exiting and build on top of previous works on Auto Compres- sor Networks (ACNs). ACNs remove residual connections and replace them with so called long connections, that directly connect each intermediate layer, to the network output. This direct connectivity allows ACNs to compress information on the earlier layers, and makes them good candidates for early exiting methodologies. In this thesis we implement a variety of early exit techniques on ACNs. We try ap- proaches based on intermediate layer logits, intermediate layer embedding distances and on trainable early exit decision heads, and evaluate them on image and language tasks. We achieve great inference speedups, with minimal (if any) performance degradation, compared to full network performance. We compare our early exit results on BERT, with popular techniques from the literature, and showcase the ability of early exit on ACNs to achieve a much better performance-speedup tradeoff. Specifically, our methods achieve speedups of 3-4x, in contrast to 1.5-2x found in the literature, and performance is com- parable or better.	en_US
dc.language	en	en_US
dc.subject	Dynamic Networks	en_US
dc.subject	Early Exit	en_US
dc.subject	Auto Compressor Networks	en_US
dc.title	Early Exit techniques for Auto Compressing Neural Networks	en_US
dc.description.pages	97	en_US
dc.contributor.supervisor	Ποταμιάνος Αλέξανδρος	en_US
dc.department	Τομέας Σημάτων, Ελέγχου και Ρομποτικής	en_US
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
Early_exit_and_speculative_decoding_techniques_for_Auto_Compressing_Neural_Networks-5.pdf		2.38 MB	Adobe PDF	View/Open

Show simple item record