Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19793
Τίτλος: | LUMAX: A LUT-Based Mixed-Precision Accelerator for LLM Inference on the Edge |
Συγγραφείς: | Ιωάννου, Κωνσταντίνος Σούντρης Δημήτριος |
Λέξεις κλειδιά: | LUT Mixed-Precision Accelerator GEMM Low- bit LLM |
Ημερομηνία έκδοσης: | 29-Σεπ-2025 |
Περίληψη: | In recent years, the rapid growth of large language models (LLMs) has increased demand for efficient inference on both datacenter and edge platforms. While quantization reduces computation and memory costs, mixed-precision operations, where activations remain in higher precision while weights are quan- tized to lower bitwidths, remain inefficient on general-purpose hardware. Lookup Table (LUT)-based methods offer a promising alternative, yet achieving an optimal balance of memory usage, flexibility, and workload adaptability remains challenging. We propose LUMAX, a fully integrated LUT-based mixed-precision GeMM accelerator for energy-efficient LLM inference. LUMAX features a reconfigurable hardware design, allowing for efficient support of different activation and weight bitwidths. To reduce LUT overhead, we employ a quarter-size LUT (¼-LUT) with efficient indexing and data packaging, minimizing storage and data transfer. LUMAX has been implemented as a tightly cou- pled RocketChip Co-processor (RoCC), thus enabling seamless processor integration with RISC-V cores. By extending key ideas from recent LUT-based designs and combining them with full processor integration and reconfigurable hardware, LUMAX provides a flexible, power-efficient accelerator for quantized LLM inference, blending hardware adaptability, software usability, and architectural efficiency. Evaluation results show that LUMAX, prototyped on a ZCU106 FPGA, reduces LUT and DSP usage by up to 33% and 96%, achieves 79% fewer cycles, and delivers up to 4.7× speedup on LLaMA2, with up to 70% improved energy efficiency over prior GeMM accelerators such as Gemmini |
URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19793 |
Εμφανίζεται στις συλλογές: | Διπλωματικές Εργασίες - Theses |
Αρχεία σε αυτό το τεκμήριο:
Αρχείο | Περιγραφή | Μέγεθος | Μορφότυπος | |
---|---|---|---|---|
kostis_ioannou_thesis.pdf | 13.74 MB | Adobe PDF | Εμφάνιση/Άνοιγμα |
Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.