Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265
Τίτλος: | Model-assisted optimization of Linear Algebra routines on multi-GPU computing systems |
Συγγραφείς: | Anastasiadis, Petros Γκούμας Γεώργιος |
Λέξεις κλειδιά: | Linear algebra Graphics processing units (GPUs) Matrix-matrix multiplication Modeling Autotuning Multi-GPU systems Software libraries Communication routing BLAS routines Overlap optimization |
Ημερομηνία έκδοσης: | 9-Σεπ-2024 |
Περίληψη: | Dense linear algebra operations appear frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieving optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating optimized libraries to ensure good performance. However, optimizing BLAS for multi-GPU introduces numerous challenges similar to distributed computing, like data decomposition, task scheduling, and communication across GPUs with distinct memory spaces. This complexity of multi-GPU makes BLAS optimization very complex, leading to sub-optimal performance or system-specific solutions with reduced portability. To address these issues, we suggest a model-based autotuning approach: we introduce several performance models for BLAS and integrate them into PARALiA, an end-to-end BLAS library. PARALiA uses model-driven insights to dynamically autotune BLAS execution, tailoring performance-critical parameters for each specific problem and system during runtime. This autotuning is coupled with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. PARALiA provides state-of-the-art performance and energy efficiency and incorporates the ability to adapt to heterogeneous systems and scenarios via model-based decisions. Finally, we focus on the GEMM kernel, extending PARALiA with a custom static scheduler that integrates model-driven algorithmic, communication, and autotuning optimizations (PARALiA-GEMMex), which delivers significantly superior performance compared to the state-of-the-art. |
URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265 |
Εμφανίζεται στις συλλογές: | Διδακτορικές Διατριβές - Ph.D. Theses |
Αρχεία σε αυτό το τεκμήριο:
Αρχείο | Περιγραφή | Μέγεθος | Μορφότυπος | |
---|---|---|---|---|
PhD_thesis_updated_final.pdf | Main file (thesis) | 3.28 MB | Adobe PDF | Εμφάνιση/Άνοιγμα |
Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.