LUT-GEMM, an efficient kernel for quantized matrix multiplication, eliminates the resource-intensive dequantization process and reduces computational costs compared to previous kernels for weight-only quantization, enabling substantial acceleration of token generation latency in large-scale generative language models.
SAMの一般化性能を向上させるために、F-SAMが提案されました。