ModelsJun 15, 2026
Kimi K2.7 Code Released: Enhanced Code and Agent Capabilities with 30% Token Reduction
Moonshot AI recently released Kimi K2.7 Code, the first code-specialized model in the K2 series, now open-sourced on HuggingFace. The model features comprehensive upgrades in code generation, agent execution, and long-horizon tasks, with average token consumption reduced by approximately 30% compared to its predecessor K2.6. However, some users report severe quota limitations affecting the actual experience.
Core Capability Improvements
- Code Benchmarks: Kimi Code Bench v2 score increased by 21.8% (50.9→62.0), Program-Bench by 11% (48.3→53.6), and MLS Bench Lite by 31.5% (26.7→35.1), the latter approaching GPT-5.5's 35.5.
- Agent Benchmarks: Kimi Claw 24/7 Bench improved by 9.3% (42.9→46.9), MCP Atlas by 9.5% (69.4→76.0), and MCP Mark Verified by 11.4% (72.8→81.1), partially surpassing Claude Opus 4.8 (76.4) in tool-calling scenarios.
- Long-Horizon Task Optimization: Mitigated the "overthinking" issue, reducing average token consumption by 30% and improving success rates for long-cycle complex tasks.
Real-World Performance
- Physics Simulation: In scenarios like black holes and water wave rendering, K2.7 Code produces realistic outputs, with water wave rendering outperforming GPT-5.5 and Claude Opus 4.8.
- Game Development: When generating an HTML version of Super Mario, K2.7 Code can produce a playable first level, but character and map details remain abstract, with a noticeable gap compared to Claude Fable 5.
- Frontend Tasks: In 9 frontend examples, standalone tests show good results, but batch execution leads to lazy behavior and inconsistent output quality.
Pricing and Quota Controversy
- Pricing: Standard input at 6.5 RMB/1M tokens, output at 27 RMB/1M tokens, cache input at 1.3 RMB/1M tokens, consistent with K2.6.
- Quota Limits: Multiple users report that the weekly quota for the basic Code plan is exhausted after a few tests, resulting in numerous 429 (rate limit) and 402 (insufficient quota) errors, severely impacting development workflows. One user noted, "Running a single example used 63% of my weekly quota."
- Usage Requirement: Thinking mode must be enabled; disabling it causes API errors or fallback to K2.6.
Architecture and Deployment
- Model Architecture: Continues the MoE design with 1T total parameters, 32B activated parameters, 384 experts with 8 selected per step, 1 shared expert, and a context length of 256K tokens. The vision component uses a MoonViT encoder (400M parameters), supporting image and video input.
- Open Source and Deployment: Released under a Modified MIT License, supports deployment via vLLM, SGLang, and KTransformers, with native INT4 quantization.
- High-Speed Version Preview: A 6x faster version will launch on June 15, with output speeds of approximately 180 tokens/s (typical scenarios) and up to 260 tokens/s for short contexts, priced at 2x the standard version.
Industry Rankings
- Second overall in ErdosBench, trailing only Claude Fable 5 max.
- First among open-source models in SWE-bench and Terminal-Bench 2.1, third in Vibe Code Bench, and second in ProgramBench.
- In Weco's independent research task evaluation, K2.7 Code ranked fifth with a score of 0.747, and first in machine learning engineering specialization.
Also available in 中文.