A Novel Reconfigurable Out-of-Order GPU Microarchitecture with Runtime Workload Characterization

Eleftherakis, Panagiotis-Eleftherios

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18661

Τίτλος:	A Novel Reconfigurable Out-of-Order GPU Microarchitecture with Runtime Workload Characterization
Συγγραφείς:	Eleftherakis, Panagiotis-Eleftherios Σούντρης Δημήτριος
Λέξεις κλειδιά:	General-Purpose GPU, High Performance Computing, Reconfigurable computing, Instruction Level Parallelism, Out-of-Order, Parallel Systems, Energy Efficient computing, Modelling and Simulation
Ημερομηνία έκδοσης:	21-Μαρ-2023
Περίληψη:	Since the breakdown of Moore’s law, high processor performance has been driven by Massively Parallel Processing and hardware specialization. The halt met by Dennard’s scaling and the advent of the "Dark Silicon Era" necessitate energy-efficient computing. In this context, heterogeneous architectures and reconfigurable computing have emerged as flexible approaches for achieving the above goals. Meanwhile, the previously proposed Light-weight Out-of-Order GPU (LOOG) execution scheme addresses the performance stagnation met by a class of general-purpose GPU workloads, by complementing the traditional TLP leveraging and fast context switching of the GPU, with exploitation of the inherent Instruction Level Parallelism (ILP) of these workloads. As it constitutes the backbone of this thesis, we implement it in the most recent version of Accel-Sim, a GPU simulation framework that provides modelling of recent high-end NVIDIA GPU architectures, built around the performance model of GPGPU-Sim 4.1.0, a cycle-level GPU performance simulator. Having accommodated LOOG on an HPC-relevant platform (NVIDIA Quadro GV100, powered by the Volta microarchitecture) by right-sizing its structures, implementing a dynamic Instruction Buffer reconfiguration mechanism and optimally configuring GPU pipeline front-end components, we collect detailed architecture bottleneck statistics across 7 benchmark suites and 100 CUDA kernels. The emerging application characterization and the study of workload characteristics that predict speedup on LOOG, paired with a scalability analysis of LOOG components from an architectural standpoint, motivates the assessment of a Scalable, Reconfigurable Out-of-Order GPU Microarchitecture that appropriately handles both kernels deemed LOOG-sensitive as well as generic kernels, to maximize performance or energy efficiency. The reconfigurable microarchitecture is evaluated under different reconfiguration schemes and granularities, including a per-kernel-launch granularity hardware reconfiguration controller using runtime performance counters to predict application OOO performance improvement. A static scale-up LOOG configuration provides a speedup of 1.48 for generic kernels and a 13.7% reduction in energy dissipation, compared to the baseline architecture. Reconfiguration under programmer-assisted directives and using the hardware controller can provide the same speedup when needed and have the potential to improve energy efficiency from baseline (in-order microarchitecture) by 22.4% and 19.5% respectively.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18661
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
Eleftherakis_Panagiotis_diploma_thesis.pdf		7.19 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε την πλήρη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.