Kog AI 发布推理引擎：标准 GPU 实现 3000 tokens/s 单请求速度

Kog AI 日前发布了 Kog Inference Engine（KIE）技术预览版，在 8× AMD MI300X GPU 上实现单请求 3000 tokens/s 的生成速度，8× NVIDIA H200 上达到 2100 tokens/s。更关键的是，这一成绩在未使用量化、投机解码、剪枝或 KV Cache 压缩的前提下达成。为什么单请求速度突然重要了传统推理基准测试通常关注聚合吞吐量和...

开源开源中国Updated 2h ago1 min read

🤖 AI AI Summary & AI Analysis

⟳ AI is analyzing this article…

Full article body is being fetched in the background. Refresh in a moment to see the complete paragraphs. For now this page shows a summary and AI analysis.