中文
← Back to news
ModelsJun 15, 2026

Kimi K2.7 Code Released: Enhanced Code and Agent Capabilities with 30% Token Reduction

Moonshot AI recently released Kimi K2.7 Code, the first code-specialized model in the K2 series, now open-sourced on HuggingFace. The model features comprehensive upgrades in code generation, agent execution, and long-horizon tasks, with average token consumption reduced by approximately 30% compared to its predecessor K2.6. However, some users report severe quota limitations affecting the actual experience.

Core Capability Improvements

  • Code Benchmarks: Kimi Code Bench v2 score increased by 21.8% (50.9→62.0), Program-Bench by 11% (48.3→53.6), and MLS Bench Lite by 31.5% (26.7→35.1), the latter approaching GPT-5.5's 35.5.
  • Agent Benchmarks: Kimi Claw 24/7 Bench improved by 9.3% (42.9→46.9), MCP Atlas by 9.5% (69.4→76.0), and MCP Mark Verified by 11.4% (72.8→81.1), partially surpassing Claude Opus 4.8 (76.4) in tool-calling scenarios.
  • Long-Horizon Task Optimization: Mitigated the "overthinking" issue, reducing average token consumption by 30% and improving success rates for long-cycle complex tasks.

Real-World Performance

  • Physics Simulation: In scenarios like black holes and water wave rendering, K2.7 Code produces realistic outputs, with water wave rendering outperforming GPT-5.5 and Claude Opus 4.8.
  • Game Development: When generating an HTML version of Super Mario, K2.7 Code can produce a playable first level, but character and map details remain abstract, with a noticeable gap compared to Claude Fable 5.
  • Frontend Tasks: In 9 frontend examples, standalone tests show good results, but batch execution leads to lazy behavior and inconsistent output quality.

Pricing and Quota Controversy

  • Pricing: Standard input at 6.5 RMB/1M tokens, output at 27 RMB/1M tokens, cache input at 1.3 RMB/1M tokens, consistent with K2.6.
  • Quota Limits: Multiple users report that the weekly quota for the basic Code plan is exhausted after a few tests, resulting in numerous 429 (rate limit) and 402 (insufficient quota) errors, severely impacting development workflows. One user noted, "Running a single example used 63% of my weekly quota."
  • Usage Requirement: Thinking mode must be enabled; disabling it causes API errors or fallback to K2.6.

Architecture and Deployment

  • Model Architecture: Continues the MoE design with 1T total parameters, 32B activated parameters, 384 experts with 8 selected per step, 1 shared expert, and a context length of 256K tokens. The vision component uses a MoonViT encoder (400M parameters), supporting image and video input.
  • Open Source and Deployment: Released under a Modified MIT License, supports deployment via vLLM, SGLang, and KTransformers, with native INT4 quantization.
  • High-Speed Version Preview: A 6x faster version will launch on June 15, with output speeds of approximately 180 tokens/s (typical scenarios) and up to 260 tokens/s for short contexts, priced at 2x the standard version.

Industry Rankings

  • Second overall in ErdosBench, trailing only Claude Fable 5 max.
  • First among open-source models in SWE-bench and Terminal-Bench 2.1, third in Vibe Code Bench, and second in ProgramBench.
  • In Weco's independent research task evaluation, K2.7 Code ranked fifth with a score of 0.747, and first in machine learning engineering specialization.

Also available in 中文.