Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19793
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ιωάννου, Κωνσταντίνος | - |
dc.date.accessioned | 2025-10-14T08:33:20Z | - |
dc.date.available | 2025-10-14T08:33:20Z | - |
dc.date.issued | 2025-09-29 | - |
dc.identifier.uri | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19793 | - |
dc.description.abstract | In recent years, the rapid growth of large language models (LLMs) has increased demand for efficient inference on both datacenter and edge platforms. While quantization reduces computation and memory costs, mixed-precision operations, where activations remain in higher precision while weights are quan- tized to lower bitwidths, remain inefficient on general-purpose hardware. Lookup Table (LUT)-based methods offer a promising alternative, yet achieving an optimal balance of memory usage, flexibility, and workload adaptability remains challenging. We propose LUMAX, a fully integrated LUT-based mixed-precision GeMM accelerator for energy-efficient LLM inference. LUMAX features a reconfigurable hardware design, allowing for efficient support of different activation and weight bitwidths. To reduce LUT overhead, we employ a quarter-size LUT (¼-LUT) with efficient indexing and data packaging, minimizing storage and data transfer. LUMAX has been implemented as a tightly cou- pled RocketChip Co-processor (RoCC), thus enabling seamless processor integration with RISC-V cores. By extending key ideas from recent LUT-based designs and combining them with full processor integration and reconfigurable hardware, LUMAX provides a flexible, power-efficient accelerator for quantized LLM inference, blending hardware adaptability, software usability, and architectural efficiency. Evaluation results show that LUMAX, prototyped on a ZCU106 FPGA, reduces LUT and DSP usage by up to 33% and 96%, achieves 79% fewer cycles, and delivers up to 4.7× speedup on LLaMA2, with up to 70% improved energy efficiency over prior GeMM accelerators such as Gemmini | en_US |
dc.language | en | en_US |
dc.subject | LUT | en_US |
dc.subject | Mixed-Precision | en_US |
dc.subject | Accelerator | en_US |
dc.subject | GEMM | en_US |
dc.subject | Low- bit LLM | en_US |
dc.title | LUMAX: A LUT-Based Mixed-Precision Accelerator for LLM Inference on the Edge | en_US |
dc.description.pages | 131 | en_US |
dc.contributor.supervisor | Σούντρης Δημήτριος | en_US |
dc.department | Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών | en_US |
Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
kostis_ioannou_thesis.pdf | 13.74 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.