Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/13091
Title: Integrating High-level Synthesis Derived Hardware Accelerators On An Fpga-based Soc: Evaluation And Analysis Of Design Alternatives
Authors: Κωνσταντίνος Ραΐλης
Σούντρης Δημήτριος
Keywords: high-level synthesis
amba axi
axi4-lite
axi4-stream
arm
zedboard
direct memory access
embedded linux
hw/sw codesign
harris and stephens corner detection algorithm
support vector machines
ecg analysis
soc
fpga
Issue Date: 1-Apr-2016
Abstract: In recent years, the design of hardware accelerators has been established as a standard practice when targeting to optimizations of algorithmic implementations. FPGA-based accelerators, in particular, have gained the interest of system architects and the scientific world due to the innate fast hardware development and reconfiguration capabilities that are offered by an FPGA device. The features combined with the level of design abstractions of High-Level Synthesis (HLS) frame a definite solution when it comes to fast prototyping of system designs. Lately, the tendency for an FPGA device is to comprise the benefits of embedded processors, thus forming a whole system-on-a-chip (SoC). The coexistence of hardware accelerators and embedded processors on a single device have brought the interconnection of these components to the proscenium as an element of vital significance for the performance of the whole system. In order for the custom hardware to be readily interconnected to a processing system, the Intellectual Property (IP) design style has been adopted. Typically, an IP is equipped with control and communication interfaces so that it can be easily combined with other components, in most cases, without the utilization of additional hardware. A widely used communication interface for IP generation is the ARM AMBA Advanced eXtensible Interface (AXI) protocol. Design alternatives offered by the AXI might range from simple low-bandwidth communication and data transfers to higher values of bandwidth by employing the available Direct Memory Access features. In this work, we focus on the system implementation flow targeting to a Zynq-7000 AP SoC device. Beginning with the addition of different communication interfaces we generate custom accelerator IPs through HLS. Then we proceed to the interconnection of those IPs with an ARM-based processing system and generate the system design. The final steps include the generation of Embedded Linux distributions for our custom hardware and the development of a user space application to be executed on the processing system of our design. The hardware accelerators that are employed for evaluation and analysis of design alternatives appertain to two distinct scientific fields. The first one is an implementation of the Harris & Stephens Corner Detection Algorithm. The second is a Support Vector Machine Classifier for Arrhythmia Detection using MIT-BIH ECG signal database. The employed accelerators differ not only in their respective fields but also in the input data sizes, complexity of the code and resource needs. Our combined analysis shows the impact of different communication interfaces in latency, bandwidth, utilized FPGA resources and overall system performance. The exploration of different interface and interconnection configurations for a default accelerator lead to latency gains of up to 20% and significant bandwidth gains.
URI: http://artemis-new.cslab.ece.ntua.gr:8080/jspui/handle/123456789/13091
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File SizeFormat 
DT2016-0071.pdf3.22 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.