LUMAX: A LUT-Based Mixed-Precision Accelerator for LLM Inference on the Edge

Ιωάννου, Κωνσταντίνος

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19793

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ιωάννου, Κωνσταντίνος	-
dc.date.accessioned	2025-10-14T08:33:20Z	-
dc.date.available	2025-10-14T08:33:20Z	-
dc.date.issued	2025-09-29	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19793	-
dc.description.abstract	In recent years, the rapid growth of large language models (LLMs) has increased demand for efficient inference on both datacenter and edge platforms. While quantization reduces computation and memory costs, mixed-precision operations, where activations remain in higher precision while weights are quan- tized to lower bitwidths, remain inefficient on general-purpose hardware. Lookup Table (LUT)-based methods offer a promising alternative, yet achieving an optimal balance of memory usage, flexibility, and workload adaptability remains challenging. We propose LUMAX, a fully integrated LUT-based mixed-precision GeMM accelerator for energy-efficient LLM inference. LUMAX features a reconfigurable hardware design, allowing for efficient support of different activation and weight bitwidths. To reduce LUT overhead, we employ a quarter-size LUT (¼-LUT) with efficient indexing and data packaging, minimizing storage and data transfer. LUMAX has been implemented as a tightly cou- pled RocketChip Co-processor (RoCC), thus enabling seamless processor integration with RISC-V cores. By extending key ideas from recent LUT-based designs and combining them with full processor integration and reconfigurable hardware, LUMAX provides a flexible, power-efficient accelerator for quantized LLM inference, blending hardware adaptability, software usability, and architectural efficiency. Evaluation results show that LUMAX, prototyped on a ZCU106 FPGA, reduces LUT and DSP usage by up to 33% and 96%, achieves 79% fewer cycles, and delivers up to 4.7× speedup on LLaMA2, with up to 70% improved energy efficiency over prior GeMM accelerators such as Gemmini	en_US
dc.language	en	en_US
dc.subject	LUT	en_US
dc.subject	Mixed-Precision	en_US
dc.subject	Accelerator	en_US
dc.subject	GEMM	en_US
dc.subject	Low- bit LLM	en_US
dc.title	LUMAX: A LUT-Based Mixed-Precision Accelerator for LLM Inference on the Edge	en_US
dc.description.pages	131	en_US
dc.contributor.supervisor	Σούντρης Δημήτριος	en_US
dc.department	Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	en_US
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
kostis_ioannou_thesis.pdf		13.74 MB	Adobe PDF	View/Open

Show simple item record