Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18475
Title: Accelerating Irregular Applications via Efficient Synchronization and Data Access Techniques
Authors: ΓΙΑΝΝΟΥΛΑ, ΧΡΙΣΤΙΝΑ
Γκούμας Γεώργιος
Keywords: Μη-Κανονικές Εφαρμογές
Irregular Applications
Συγχρονισμός
Synchronization
Βελτιστοποιημένες Τεχνικές Πρόσβασης στα Δεδομένα
Efficient Data Access Techniques
Πολυπύρηνα Συστήματα
Multicore Systems
Αρχιτεκτονικές με Επεξεργασία Κοντά στη Μνήμη
Processing-In-Memory Architectures
Issue Date: 27-Sep-2022
Abstract: Irregular applications comprise an increasingly important workload domain for many fields, including bioinformatics, chemistry, graph analytics, physics, social sciences and machine learning. Therefore, achieving high performance and energy efficiency in the execution of emerging irregular applications is of vital importance. While there is abundant research on accelerating irregular applications, in this thesis, we identify two critical challenges. First, irregular applications are hard to scale to a high number of parallel threads due to high synchronization overheads. Second, irregular applications have complex memory access patterns and exhibit low operational intensity, and thus they are bottlenecked by expensive data access costs. This doctoral thesis studies the root causes of inefficiency of irregular applications in modern computing systems, and aims to fundamentally address such inefficiencies, by 1) proposing low-overhead synchronization techniques among parallel threads in cooperation with 2) well-crafted data access policies. Our approach leads to high system performance and energy efficiency on the execution of irregular applications in modern computing platforms, both processor-centric CPU systems and memory-centric Processing-In-Memory (PIM) systems. We make four major contributions to accelerating irregular applications in different contexts including CPU and Near-Data-Processing (NDP) (or Processing-In-Memory (PIM)) systems. First, we design ColorTM, a novel parallel graph coloring algorithm for CPU systems that trades off using synchronization with lower data access costs. ColorTM proposes an efficient data management technique co-designed with a speculative synchronization scheme implemented on Hardware Transactional Memory, and significantly outperforms prior state-of-the-art graph coloring algorithms across a wide range of real-world graphs. Second, we propose SmartPQ, an adaptive priority queue that achieves high performance under all various contention scenarios in Non-Uniform Memory Access (NUMA) CPU systems. SmartPQ tunes itself by dynamically switching between a NUMA-oblivious and a NUMA-aware algorithmic mode, thus providing low data access costs in high contention scenarios, and high levels of parallelism in low contention scenarios. Our evaluations show that SmartPQ achieves the highest throughput over prior state-of-the-art NUMA-aware and NUMA-oblivious concurrent priority queues under various contention scenarios and even when contention varies during runtime. Third, we introduce SynCron, the first practical and lightweight hardware synchronization mechanism tailored for NDP systems. SynCron minimizes synchronization overheads in NDP systems by (i) adding low-cost hardware support near memory for synchronization acceleration, (ii) directly buffering the synchronization variables in a specialized cache memory structure, (ii) implementing a hierarchical message-passing communication scheme, and (iv) integrating a hardware-only overflow management scheme to avoid performance degradation when hardware resources for synchronization tracking are exceeded. We demonstrate that SynCron outperforms prior state-of-the-art approaches both in performance and energy consumption using a wide range of irregular applications, and has low hardware area and power overheads. Fourth, we design SparseP, the first library for high-performance Sparse Matrix Vector Multiplication (SpMV) on real Processing-In-Memory (PIM) systems. SparseP is publicly-available and includes a wide range of data partitioning, load balancing, compression and synchronization techniques to accelerate this irregular kernel in current and future PIM systems. We also xtensively characterize the widely used SpMV kernel on a real PIM architecture, and provide recommendations for software, system and hardware designers of future PIM systems. Overall, we demonstrate that the execution of irregular applications in CPU and NDP/PIM architectures can be significantly accelerated by co-designing lightweight synchronization approaches along with well-crafted data access policies. This dissertation shows that efficient synchronization and data access techniques can provide a high amount of parallelism, low-overhead inter-thread communication and low data access and data movement costs in emerging irregular applications, thus significantly improving system performance and system energy. This doctoral thesis also bridges the gap between processor-centric CPU systems and memory-centric PIM systems in the critically-important area of irregular applications. We hope that this dissertation inspires future work in co-designing software algorithms with cutting-edge computing platforms to significantly accelerate emerging irregular applications.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18475
Appears in Collections:Διδακτορικές Διατριβές - Ph.D. Theses

Files in This Item:
File Description SizeFormat 
cgiannoula_phd_thesis_final_new.pdf7.31 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.