Zhipu GLM-5.2 Open-Sourced and Tops Coding Leaderboard, with Significant Frontend Improvements

Zhipu AI fully released the GLM-5.2 model on June 13, 2025. It ranks second globally on the LMSYS Code Arena: Frontend leaderboard, behind only Claude Fable 5, making it the top open-source model. It also achieved first place globally in Design Arena. GLM-5.2 supports a 1M context, excels in long-range tasks, and is open-sourced under the MIT license.

Leaderboard Performance and Evaluation

Code Arena: Frontend: GLM-5.2 ranks second, scoring 29 points higher than Claude Opus 4.8 Thinking. It is second in the React sub-leaderboard and fourth in the HTML sub-leaderboard, and ranks first in subcategories such as Brand & Marketing, Reference-Based Design, Data & Analytics, Consumer Goods, and Games & Simulations.
Design Arena: First place globally, demonstrating the model's aesthetic and design capabilities.
Eight authoritative benchmarks: Strong performance, though specific scores were not detailed in the report.

Frontend Capability Improvements

Multiple independent evaluators noted that GLM-5.2's frontend capabilities have seen a qualitative leap over its predecessors (GLM-5.0/5.1). Typical test cases include:

Cyberpunk version of Along the River During the Qingming Festival: GLM-5.0 produced rough results, while GLM-5.2 generated complete building and object forms, though the cyberpunk feel was strong but the Qingming flavor slightly inferior to Opus 4.8.
Infinite text adventure game: GLM-5.0 had layout collapse, while GLM-5.2 had normal layout and cool animation effects.
Gomoku (Five-in-a-Row) game: GLM-5.0 had poor design sense; GLM-5.2 optimized the board, background, and color scheme.
Neon Runner: GLM-5.2 produced strong depth perception in the scene, supporting double jumps and explosion effects.
3D Solar System: GLM-5.2 used abstract line design, which, while not realistic, had a sense of design.

Evaluators believe that GLM-5.2 has been specifically trained for frontend tasks, and its strong design sense may lead to stylistic convergence, but overall results are close to or even locally surpass Opus 4.8.

Long Context and Engineering Capabilities

GLM-5.2 supports a 1M context and performs outstandingly in real engineering tasks:

Full codebase understanding: In the Appsmith project, GLM-5.2 accurately outlined the architecture, identified coupling points, and provided a refactoring roadmap, with coverage depth superior to CodeX.
Cross-file bug tracking: In the OpenWebUI project, GLM-5.2 located the link issue between SSE fragmentation and backend parsing, providing both frontend and backend fixes.
New feature addition: In OpenWebUI, adding a "session summary export to Markdown" feature, GLM-5.2 split the implementation into five layers, and all 38 backend tests passed.
Multi-task delivery: In building a UK student accommodation industry research package, GLM-5.2 output a complete folder at once, including charts, reports, and scripts.

Evaluators note that the 1M context is suitable for complex tasks like full codebase understanding, cross-file bug tracking, and long-term refactoring, but may lead to over-engineering in simple tasks.

Ecosystem and Tools

Zhipu also launched ZCode (zcode.z.ai), an agent development kit similar to Claude Code and OpenAI Codex, supporting Windows and macOS. Evaluations show that GLM-5.2 generates significantly better UI results in ZCode than in Claude Code, likely due to ZCode's engineering optimizations. New ZCode users can use it for free for 5 days, and subscribers enjoy a 150% quota.

Industry Impact

The open-sourcing and leaderboard performance of GLM-5.2 mark the first time a domestic model has entered the "Big Three" of AI programming (Claude, OpenAI, Zhipu), pushing Google Gemini out of the top ranks. Against the backdrop of restricted access to Claude Fable 5 due to safety controversies, Zhipu emphasizes that "frontier intelligence should belong to everyone," promoting the open-source ecosystem.

Limitations and Outlook

Despite significant frontend improvements, GLM-5.2 still lags behind Opus 4.8 in overall capabilities, especially in processing time, depth of thought, and first-attempt accuracy. Evaluators suggest avoiding the 1M context for simple tasks to maintain efficiency. The model API will be available next week, and the open-source version follows the MIT license.