Model-assisted optimization of Linear Algebra routines on multi-GPU computing systems

Anastasiadis, Petros

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265

Πλήρες αρχείο μεταδεδομένων

Πεδίο DC	Τιμή	Γλώσσα
dc.contributor.author	Anastasiadis, Petros	-
dc.date.accessioned	2024-09-19T14:12:16Z	-
dc.date.available	2024-09-19T14:12:16Z	-
dc.date.issued	2024-09-09	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19265	-
dc.description.abstract	Dense linear algebra operations appear frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieving optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating optimized libraries to ensure good performance. However, optimizing BLAS for multi-GPU introduces numerous challenges similar to distributed computing, like data decomposition, task scheduling, and communication across GPUs with distinct memory spaces. This complexity of multi-GPU makes BLAS optimization very complex, leading to sub-optimal performance or system-specific solutions with reduced portability. To address these issues, we suggest a model-based autotuning approach: we introduce several performance models for BLAS and integrate them into PARALiA, an end-to-end BLAS library. PARALiA uses model-driven insights to dynamically autotune BLAS execution, tailoring performance-critical parameters for each specific problem and system during runtime. This autotuning is coupled with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. PARALiA provides state-of-the-art performance and energy efficiency and incorporates the ability to adapt to heterogeneous systems and scenarios via model-based decisions. Finally, we focus on the GEMM kernel, extending PARALiA with a custom static scheduler that integrates model-driven algorithmic, communication, and autotuning optimizations (PARALiA-GEMMex), which delivers significantly superior performance compared to the state-of-the-art.	en_US
dc.language	en	en_US
dc.subject	Linear algebra	en_US
dc.subject	Graphics processing units (GPUs)	en_US
dc.subject	Matrix-matrix multiplication	en_US
dc.subject	Modeling	en_US
dc.subject	Autotuning	en_US
dc.subject	Multi-GPU systems	en_US
dc.subject	Software libraries	en_US
dc.subject	Communication routing	en_US
dc.subject	BLAS routines	en_US
dc.subject	Overlap optimization	en_US
dc.title	Model-assisted optimization of Linear Algebra routines on multi-GPU computing systems	en_US
dc.description.pages	183	en_US
dc.contributor.supervisor	Γκούμας Γεώργιος	en_US
dc.department	Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	en_US
Εμφανίζεται στις συλλογές:	Διδακτορικές Διατριβές - Ph.D. Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
PhD_thesis_updated_final.pdf	Main file (thesis)	3.28 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε τη σύντομη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.