Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265
Πλήρες αρχείο μεταδεδομένων
Πεδίο DC | Τιμή | Γλώσσα |
---|---|---|
dc.contributor.author | Anastasiadis, Petros | - |
dc.date.accessioned | 2024-09-19T14:12:16Z | - |
dc.date.available | 2024-09-19T14:12:16Z | - |
dc.date.issued | 2024-09-09 | - |
dc.identifier.uri | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265 | - |
dc.description.abstract | Dense linear algebra operations appear frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieving optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating optimized libraries to ensure good performance. However, optimizing BLAS for multi-GPU introduces numerous challenges similar to distributed computing, like data decomposition, task scheduling, and communication across GPUs with distinct memory spaces. This complexity of multi-GPU makes BLAS optimization very complex, leading to sub-optimal performance or system-specific solutions with reduced portability. To address these issues, we suggest a model-based autotuning approach: we introduce several performance models for BLAS and integrate them into PARALiA, an end-to-end BLAS library. PARALiA uses model-driven insights to dynamically autotune BLAS execution, tailoring performance-critical parameters for each specific problem and system during runtime. This autotuning is coupled with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. PARALiA provides state-of-the-art performance and energy efficiency and incorporates the ability to adapt to heterogeneous systems and scenarios via model-based decisions. Finally, we focus on the GEMM kernel, extending PARALiA with a custom static scheduler that integrates model-driven algorithmic, communication, and autotuning optimizations (PARALiA-GEMMex), which delivers significantly superior performance compared to the state-of-the-art. | en_US |
dc.language | en | en_US |
dc.subject | Linear algebra | en_US |
dc.subject | Graphics processing units (GPUs) | en_US |
dc.subject | Matrix-matrix multiplication | en_US |
dc.subject | Modeling | en_US |
dc.subject | Autotuning | en_US |
dc.subject | Multi-GPU systems | en_US |
dc.subject | Software libraries | en_US |
dc.subject | Communication routing | en_US |
dc.subject | BLAS routines | en_US |
dc.subject | Overlap optimization | en_US |
dc.title | Model-assisted optimization of Linear Algebra routines on multi-GPU computing systems | en_US |
dc.description.pages | 183 | en_US |
dc.contributor.supervisor | Γκούμας Γεώργιος | en_US |
dc.department | Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών | en_US |
Εμφανίζεται στις συλλογές: | Διδακτορικές Διατριβές - Ph.D. Theses |
Αρχεία σε αυτό το τεκμήριο:
Αρχείο | Περιγραφή | Μέγεθος | Μορφότυπος | |
---|---|---|---|---|
PhD_thesis_updated_final.pdf | Main file (thesis) | 3.28 MB | Adobe PDF | Εμφάνιση/Άνοιγμα |
Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.