Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19793
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΙωάννου, Κωνσταντίνος-
dc.date.accessioned2025-10-14T08:33:20Z-
dc.date.available2025-10-14T08:33:20Z-
dc.date.issued2025-09-29-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19793-
dc.description.abstractIn recent years, the rapid growth of large language models (LLMs) has increased demand for efficient inference on both datacenter and edge platforms. While quantization reduces computation and memory costs, mixed-precision operations, where activations remain in higher precision while weights are quan- tized to lower bitwidths, remain inefficient on general-purpose hardware. Lookup Table (LUT)-based methods offer a promising alternative, yet achieving an optimal balance of memory usage, flexibility, and workload adaptability remains challenging. We propose LUMAX, a fully integrated LUT-based mixed-precision GeMM accelerator for energy-efficient LLM inference. LUMAX features a reconfigurable hardware design, allowing for efficient support of different activation and weight bitwidths. To reduce LUT overhead, we employ a quarter-size LUT (¼-LUT) with efficient indexing and data packaging, minimizing storage and data transfer. LUMAX has been implemented as a tightly cou- pled RocketChip Co-processor (RoCC), thus enabling seamless processor integration with RISC-V cores. By extending key ideas from recent LUT-based designs and combining them with full processor integration and reconfigurable hardware, LUMAX provides a flexible, power-efficient accelerator for quantized LLM inference, blending hardware adaptability, software usability, and architectural efficiency. Evaluation results show that LUMAX, prototyped on a ZCU106 FPGA, reduces LUT and DSP usage by up to 33% and 96%, achieves 79% fewer cycles, and delivers up to 4.7× speedup on LLaMA2, with up to 70% improved energy efficiency over prior GeMM accelerators such as Gemminien_US
dc.languageenen_US
dc.subjectLUTen_US
dc.subjectMixed-Precisionen_US
dc.subjectAcceleratoren_US
dc.subjectGEMMen_US
dc.subjectLow- bit LLMen_US
dc.titleLUMAX: A LUT-Based Mixed-Precision Accelerator for LLM Inference on the Edgeen_US
dc.description.pages131en_US
dc.contributor.supervisorΣούντρης Δημήτριοςen_US
dc.departmentΤομέας Τεχνολογίας Πληροφορικής και Υπολογιστώνen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
kostis_ioannou_thesis.pdf13.74 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.