Grunnleggende konsepter
Enhancing LLM inference speed with SDSAT while maintaining accuracy.
Statistikk
Experiments conducted on the CodeLlama-13B and 7B models have yielded speed increases of over 3.5X and 3.0X, respectively.
Model Size: 7B, HumanEval: 33.5%, MBPP: 49.8%
Model Size: 13B, HumanEval: 36.0%, MBPP: 51.0%
Sitater
"We propose an acceleration scheme for large language models (LLMs) through Speculative Decoding with Semantic Adaptive Tokens (SDSAT)."
"Experiments conducted on the CodeLlama-13B and 7B models have yielded speed increases of over 3.5X and 3.0X, respectively."