Exploring Kernel Approximations for TinyML Inference Acceleration on Microcontrollers

Μέντζος, Γεώργιος

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18889

Full metadata record

DC Field	Value	Language
dc.contributor.author	Μέντζος, Γεώργιος	-
dc.date.accessioned	2023-11-07T05:58:10Z	-
dc.date.available	2023-11-07T05:58:10Z	-
dc.date.issued	2023-10-24	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18889	-
dc.description.abstract	The rapid growth of always-on microcontroller-based IoT devices has opened up numerous applications, from smart manufacturing to personalized healthcare. Despite the widespread adoption of energy-efficient microcontroller units (MCUs) in the Tiny Maching Learning (TinyML) domain, they face significant limitations in terms of performance and memory (RAM, Flash), especially when considering deep networks for complex classification tasks. In this work, we combine approximate computing and software kernel design to accelerate the inference of approximate CNN models on MCUs. Our kernel-based approximation framework first unpacks the operands of each convolution layer and then performs an offline significance calculation for each operand. Subsequently, through a design space exploration, it employs a computation skipping approximation strategy based on the calculated significance, offering various trade-offs between reduced computations and classification accuracy. Our evaluation, conducted on an STM32-Nucleo board using three popular CNNs trained on the CIFAR-10 dataset, demonstrates that our Pareto optimal solutions can yield significant benefits. Compared to state-of-the-art exact inference methods, our approach achieves 9% reduction in latency with almost zero degradation in Top-1 accuracy loss (<1%) on MCUs with cache-enabled architecture. Furthermore, when targeting non-cached MCUs, the latency reduction is highly increased to 37%, again at the expense of less than 1% Top-1 accuracy loss. The various trade-offs explored in this thesis, hold the potential to enable more practical applications and the deployment of deeper networks on compact MCUs.	en_US
dc.language	en	en_US
dc.subject	Approximate Computing	en_US
dc.subject	Microcontrollers	en_US
dc.title	Exploring Kernel Approximations for TinyML Inference Acceleration on Microcontrollers	en_US
dc.description.pages	92	en_US
dc.contributor.supervisor	Σούντρης Δημήτριος	en_US
dc.department	Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	en_US
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
Exploring Kernel Approximations for TinyML Inference Acceleration on Microcontrollers.pdf		3.72 MB	Adobe PDF	View/Open

Show simple item record