Optimizing FPGA-Based Acceleration for Efficient Large Language Model Inference
This paper introduces an analytical framework to comprehensively analyze the potential and limitations of FPGA-based spatial acceleration for efficient large language model inference. It also provides a suite of modular and reusable HLS kernels to enable high-performance FPGA-based LLM accelerators.