Please use this identifier to cite or link to this item:
|Title:||A Novel Reconfigurable Out-of-Order GPU Microarchitecture with Runtime Workload Characterization|
|Keywords:||General-Purpose GPU, High Performance Computing, Reconfigurable computing, Instruction Level Parallelism, Out-of-Order, Parallel Systems, Energy Efficient computing, Modelling and Simulation|
|Abstract:||Since the breakdown of Moore’s law, high processor performance has been driven by Massively Parallel Processing and hardware specialization. The halt met by Dennard’s scaling and the advent of the "Dark Silicon Era" necessitate energy-efficient computing. In this context, heterogeneous architectures and reconfigurable computing have emerged as flexible approaches for achieving the above goals. Meanwhile, the previously proposed Light-weight Out-of-Order GPU (LOOG) execution scheme addresses the performance stagnation met by a class of general-purpose GPU workloads, by complementing the traditional TLP leveraging and fast context switching of the GPU, with exploitation of the inherent Instruction Level Parallelism (ILP) of these workloads. As it constitutes the backbone of this thesis, we implement it in the most recent version of Accel-Sim, a GPU simulation framework that provides modelling of recent high-end NVIDIA GPU architectures, built around the performance model of GPGPU-Sim 4.1.0, a cycle-level GPU performance simulator. Having accommodated LOOG on an HPC-relevant platform (NVIDIA Quadro GV100, powered by the Volta microarchitecture) by right-sizing its structures, implementing a dynamic Instruction Buffer reconfiguration mechanism and optimally configuring GPU pipeline front-end components, we collect detailed architecture bottleneck statistics across 7 benchmark suites and 100 CUDA kernels. The emerging application characterization and the study of workload characteristics that predict speedup on LOOG, paired with a scalability analysis of LOOG components from an architectural standpoint, motivates the assessment of a Scalable, Reconfigurable Out-of-Order GPU Microarchitecture that appropriately handles both kernels deemed LOOG-sensitive as well as generic kernels, to maximize performance or energy efficiency. The reconfigurable microarchitecture is evaluated under different reconfiguration schemes and granularities, including a per-kernel-launch granularity hardware reconfiguration controller using runtime performance counters to predict application OOO performance improvement. A static scale-up LOOG configuration provides a speedup of 1.48 for generic kernels and a 13.7% reduction in energy dissipation, compared to the baseline architecture. Reconfiguration under programmer-assisted directives and using the hardware controller can provide the same speedup when needed and have the potential to improve energy efficiency from baseline (in-order microarchitecture) by 22.4% and 19.5% respectively.|
|Appears in Collections:||Διπλωματικές Εργασίες - Theses|
Files in This Item:
|Eleftherakis_Panagiotis_diploma_thesis.pdf||7.19 MB||Adobe PDF||View/Open|
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.