Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19900| Τίτλος: | Accelerating SpMM for Multi-head Self-Attention: Kernel-Level Design and Performance Analysis |
| Συγγραφείς: | Λαγός, Αναστάσιος Γκούμας Γεώργιος |
| Λέξεις κλειδιά: | Neural Networks Deep Learning Reinforcement Learning Large Language Models Transformers Multi-head Self-Attention Sparse Attention Sparse Matrix-Matrix Multiplication Parallel Computing Systems GPU Programming Kernels CUDA Optimization |
| Ημερομηνία έκδοσης: | 6-Νοε-2025 |
| Περίληψη: | Large Language Models (LLMs) rely on the Transformer architecture to capture dependencies on input tokens. As model sizes and input contexts continue to grow, the ability to efficiently handle extended sequences becomes increasingly critical to maintaining performance and scalability. However, the computational and memory demands of the standard attention mechanism increase by an order of O(n2) with respect to sequence length, impeding models from scaling further. A proposed solution, the sparse approximation of attention, aims to reduce the overall complexity while maintaining model quality. However, when implemented on accelerator platforms such as GPUs, sparse attention implementations often showcase a performance degradation caused by the emergent properties of sparsity. This work aims to analyze common bottlenecks in the implementation details of the Sparse matrix-Matrix Multiplication (SpMM) kernel and its consequent optimization, comparing performance to NVIDIA’s cuSPARSE library. The best kernel at low sparsity achieves a 57% speedup but for high sparsities, demonstrates a 49% slowdown. A different kernel design achieves a 64% speedup but for low sparsities, a 250% slowdown. Thus, the study concludes that tailoring kernel designs to the specific parameters of each problem achieves significantly better results than applying a catch-all approach. |
| URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19900 |
| Εμφανίζεται στις συλλογές: | Διπλωματικές Εργασίες - Theses |
Αρχεία σε αυτό το τεκμήριο:
| Αρχείο | Περιγραφή | Μέγεθος | Μορφότυπος | |
|---|---|---|---|---|
| main.pdf | 2.08 MB | Adobe PDF | Εμφάνιση/Άνοιγμα |
Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.