KV Cacheint量化PagedAttentionGQASpeculative Decoding codeAccelerating Generative AI with PyTorch II: GPT, FastFast Inference from Transformers via Speculative Decoding 参考 PyTorch造大模型“加速包”,不到1000行代码提速10倍!英伟达科学家:minGPT以来最好的教程式repo之一