Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265
Title: | Model-assisted optimization of Linear Algebra routines on multi-GPU computing systems |
Authors: | Anastasiadis, Petros Γκούμας Γεώργιος |
Keywords: | Linear algebra Graphics processing units (GPUs) Matrix-matrix multiplication Modeling Autotuning Multi-GPU systems Software libraries Communication routing BLAS routines Overlap optimization |
Issue Date: | 9-Sep-2024 |
Abstract: | Dense linear algebra operations appear frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieving optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating optimized libraries to ensure good performance. However, optimizing BLAS for multi-GPU introduces numerous challenges similar to distributed computing, like data decomposition, task scheduling, and communication across GPUs with distinct memory spaces. This complexity of multi-GPU makes BLAS optimization very complex, leading to sub-optimal performance or system-specific solutions with reduced portability. To address these issues, we suggest a model-based autotuning approach: we introduce several performance models for BLAS and integrate them into PARALiA, an end-to-end BLAS library. PARALiA uses model-driven insights to dynamically autotune BLAS execution, tailoring performance-critical parameters for each specific problem and system during runtime. This autotuning is coupled with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. PARALiA provides state-of-the-art performance and energy efficiency and incorporates the ability to adapt to heterogeneous systems and scenarios via model-based decisions. Finally, we focus on the GEMM kernel, extending PARALiA with a custom static scheduler that integrates model-driven algorithmic, communication, and autotuning optimizations (PARALiA-GEMMex), which delivers significantly superior performance compared to the state-of-the-art. |
URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265 |
Appears in Collections: | Διδακτορικές Διατριβές - Ph.D. Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
PhD_thesis_updated_final.pdf | Main file (thesis) | 3.28 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.