Development and Evaluation with AI Tools & Devices: Google Edge TPU for General-Purpose Computing

Σάκος, Χρόνης; Σούντρης, Δημήτριος

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18356

Title:	Development and Evaluation with AI Tools & Devices: Google Edge TPU for General-Purpose Computing
Authors:	Σάκος, Χρόνης Σούντρης, Δημήτριος Σούντρης Δημήτριος
Keywords:	Artificial Intelligence Embedded Systems Edge TPU AI accelerators GEMM
Issue Date:	30-Jun-2022
Abstract:	One of the fastest growing ground-based areas of research is Artificial Intelligence (AI), which has revolutionized a variety of application domains. Modern artificial neural networks (ANNs) impose increased computational complexity, and as a result, general-purpose CPUs struggle to provide sufficient performance. For this reason, developers are forced to integrate AI into broader customer bases with smaller, more power efficient, AI microchips and accelerators. Anticipating this trend, Google provides the Tensor Processing Units (TPUs) to accelerate AI inference in data-centers and at the edge. In this thesis, targeting embedded AI, we focus on the Edge TPU. The Edge TPU is a small Application-Specific Integrated Circuit (ASIC) that delivers high performance in a small physical and power footprint, enabling the deployment of high accuracy AI at the edge. It is a dedicated hardware that enables the parallelization of certain computations in order to achieve faster inference of them. The Edge TPU processor is capable of performing 4 Trillion Operations Per Second (TOPS), using 0.5 Watt for each TOPS (2 TOPS per Watt). However, the architecture and the instructions of such an AI-specific accelerator imposes hardware challenges and limitations for non-AI workloads for general-purpose computing. In this thesis, our goal is to provide solutions to this challenge by proposing a custom methodology for building Edge TPU compatible networks for general-purpose calculations. Moreover, we propose a solution for overcoming the barrier of the 8-bit-only operations on the TPU by breaking N-bit algrebraic computations in 8-bit parts. In this way, we support both element-wise and matrix multiplications for larger bit-widths without significant decrease in performance. Initially, we perform benchmarking on the TPU to explore and evaluate its capabilities, including both pre-trained and custom networks. For our Ship Detection network we achieve 1000-2000 FPS with no significant accuracy loss. The experimental results reveal significant acceleration in comparison to the ARM A53 co-processor and other embedded devices. Overall, the Edge TPU provides remarkable speedup for medium- and large-sized CNNs and MLPs, as well as for custom models dominated by matrix multiplications. The matrix multiplication operations are improved up to 4x compared to the 8-bit quantized ARM execution and up to 7x for 32-bit floating point. Moreover, for classic Digital Signal Processing (DSP) operations, such as the Sobel Edge Detector and Image Binning, the Edge TPU provides up to 6x better performance than ARM A53.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18356
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
DT_Chronis_Sakos.pdf		13.18 MB	Adobe PDF	View/Open

Show full item record