Please use this identifier to cite or link to this item:
|Title:||Utilizing Versal Architecture for Low-Latency Super Resolution Applications|
Συνελικτικά Νευρωνικά Δίκτυα
|Abstract:||The applications flooding the modern market, such as Machine Learning (ML), digital signal processing, and 5G applications, require ever-increasing computing power. The impasse reached by Moore’s law and the end of Dennard’s scaling, gave the impetus for the exploration of alternative, heterogeneous architectures and hardware accelerators, beyond the conventional scalar processing units (CPUs), in order to satisfy the above requirements. Existing solutions, such as graphic processing units (GPUs), digital signal processors (DSPs), and field-programmable gate arrays (FPGAs), either suffer from memory issues or their increased hardware expertise prevents their widespread market adoption. In this direction, Xilinx developed and launched a new generation of powerful hybrid computing platforms, called Versal - Adaptive Compute Acceleration Platforms (ACAPs), which combines scalar processing (CPUs), programmable logic and dedicated accelerators of very high parallelism. At the same time, a set of programming tools to support the programmability of these platforms, was developed, with multiple levels of abstraction and the main focus on ease-of-use. In this thesis, we explored the potential of these innovative architectures and tools, aiming to accelerate ML applications running at the ”edge” (edge-computing). More specifically, we chose the task of Super-Resolution (SR), that is, increasing the quality of an image using a Convolutional Neural Network (CNN), in our case, the ESPCN model. We have at our disposal the VCK190 platform from the Versal AI Core series, which is specifically designed for accelerating deep neural networks and has 400 AI Engines. We present two alternative implementations of the aforementioned model for this platform. One that is designed and developed with knowledge of the underlying hardware and aims to exploit the capabilities of all the computing cores of the platform (hardware-specific) and one that uses the Vitis AI tool, which is an automatic tool for developing ML applications, removing from the user the need for hardware programming knowledge (hardware-agnostic). We compare our implementations with reference to the baseline software implementation of the model, running on VCK190 device’s CPU. The results we get are very encouraging. The hardwarespecific implementation shows excellent performance reaching 518 FPS and presenting a maximum acceleration of 1.87x compared to the CPU and 1.36x compared to the Vitis AI implementation, even without reaching the resources’ limits and with minimal losses in image quality. The Vitis AI implementation, however, suffers both in terms of image quality and performance compared to what we expected, highlighting the importance of the ”performance - programming ease” trade-off.|
|Appears in Collections:||Διπλωματικές Εργασίες - Theses|
Files in This Item:
|Tzomaka_Thesis.pdf||5.83 MB||Adobe PDF||View/Open|
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.