Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265
Πλήρες αρχείο μεταδεδομένων
Πεδίο DC ΤιμήΓλώσσα
dc.contributor.authorAnastasiadis, Petros-
dc.date.accessioned2024-09-19T14:12:16Z-
dc.date.available2024-09-19T14:12:16Z-
dc.date.issued2024-09-09-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265-
dc.description.abstractDense linear algebra operations appear frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieving optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating optimized libraries to ensure good performance. However, optimizing BLAS for multi-GPU introduces numerous challenges similar to distributed computing, like data decomposition, task scheduling, and communication across GPUs with distinct memory spaces. This complexity of multi-GPU makes BLAS optimization very complex, leading to sub-optimal performance or system-specific solutions with reduced portability. To address these issues, we suggest a model-based autotuning approach: we introduce several performance models for BLAS and integrate them into PARALiA, an end-to-end BLAS library. PARALiA uses model-driven insights to dynamically autotune BLAS execution, tailoring performance-critical parameters for each specific problem and system during runtime. This autotuning is coupled with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. PARALiA provides state-of-the-art performance and energy efficiency and incorporates the ability to adapt to heterogeneous systems and scenarios via model-based decisions. Finally, we focus on the GEMM kernel, extending PARALiA with a custom static scheduler that integrates model-driven algorithmic, communication, and autotuning optimizations (PARALiA-GEMMex), which delivers significantly superior performance compared to the state-of-the-art.en_US
dc.languageenen_US
dc.subjectLinear algebraen_US
dc.subjectGraphics processing units (GPUs)en_US
dc.subjectMatrix-matrix multiplicationen_US
dc.subjectModelingen_US
dc.subjectAutotuningen_US
dc.subjectMulti-GPU systemsen_US
dc.subjectSoftware librariesen_US
dc.subjectCommunication routingen_US
dc.subjectBLAS routinesen_US
dc.subjectOverlap optimizationen_US
dc.titleModel-assisted optimization of Linear Algebra routines on multi-GPU computing systemsen_US
dc.description.pages183en_US
dc.contributor.supervisorΓκούμας Γεώργιοςen_US
dc.departmentΤομέας Τεχνολογίας Πληροφορικής και Υπολογιστώνen_US
Εμφανίζεται στις συλλογές:Διδακτορικές Διατριβές - Ph.D. Theses

Αρχεία σε αυτό το τεκμήριο:
Αρχείο Περιγραφή ΜέγεθοςΜορφότυπος 
PhD_thesis_updated_final.pdfMain file (thesis)3.28 MBAdobe PDFΕμφάνιση/Άνοιγμα


Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.