This paper introduces an analytical framework to comprehensively analyze the potential and limitations of FPGA-based spatial acceleration for efficient large language model inference. It also provides a suite of modular and reusable HLS kernels to enable high-performance FPGA-based LLM accelerators.
This paper proposes an FPGA-based accelerator design to efficiently support the Convolution-Transformer hybrid architecture of the state-of-the-art efficient Vision Transformer, EfficientViT, by leveraging a reconfigurable architecture and a novel time-multiplexed and pipelined dataflow.