Head-to-head · 5 hard one-shot tests

GLM-5.2 vs Kimi K2.7 vs Opus 4.8.

GLM-5.2 · z.ai coding planKimi K2.7 · MoonshotOpus 4.8 · Anthropic

I gave all three frontier coders the exact same five prompts — one shot, single HTML file, no follow-ups — straight through Agent OS. Then I screenshotted every build and put them side by side. Same test, three models, you decide.

The five tests: a Temple-Run voxel runner (Three.js) · an inner-solar-system / near-Earth-objects orbit visualizer · a liquid-in-a-bowl fluid sim · an "Introducing Nova 1" landing page · a juicy neon arcade game. Each model got the identical prompt.
The scoreboard

How they stacked up. My honest scoring.

Across all five builds, scored on whether it ran, how close it hit the brief, and how good it looked. Out of 10, averaged.

Overall — Agent OS one-shot average

Real scores from these 5 head-to-head builds (not a public benchmark).
GLM-5.2
8.5
Opus 4.8
8.4
Kimi K2.7
6.3

The short version: GLM-5.2 edges it on raw visual flair, Opus 4.8 is a half-step behind but ahead on accuracy and game-feel, and Kimi K2.7 ships working builds every time — just the plainest looking in these one-shots.

Test 1 · voxel runner

Temple-Run voxel runner (Three.js).

One shot: an endless third-person voxel runner through a procedural city — dodge blocks, grab coins, speed ramps up.

KIMI K2.7
KIMI K2.7 voxel● LIVE
GLM-5.2
GLM-5.2 voxel● LIVE
OPUS 4.8
OPUS 4.8 voxel● LIVE

▶ all three live — click a panel and play · scroll on for the next test (off-screen builds pause to stay smooth)

GLM-5.2
9
winner · flair
OPUS 4.8
8.5
KIMI K2.7
6

GLM built the densest, most detailed city — windowed skyscrapers, a speed + coins HUD. Opus ran the furthest with the cleanest motion (Score 303). Kimi's runner plays fine but is unforgiving — it crashes within seconds.

▶ Kimi build▶ GLM build▶ Opus build
Test 2 · orbit visualizer

Inner-system orbit map.

One shot: animate the inner solar system + a few hundred near-Earth-object orbits, with play/pause, speed, and a data HUD.

KIMI K2.7
KIMI K2.7 orbit● LIVE
GLM-5.2
GLM-5.2 orbit● LIVE
OPUS 4.8
OPUS 4.8 orbit● LIVE

▶ all three live — click a panel and play · scroll on for the next test (off-screen builds pause to stay smooth)

OPUS 4.8
9
winner · accuracy
GLM-5.2
7.5
KIMI K2.7
6

Opus nailed the brief — labelled planet orbits, a real NEO / close-pass panel, a sim clock. GLM went for drama: a glowing nebula swirl that's gorgeous but reads more galaxy than orbit map. Kimi's is accurate but dim and sparse.

▶ Kimi build▶ GLM build▶ Opus build
Test 3 · fluid sim

Liquid in a bowl.

One shot: thousands of particles sloshing in a round bowl you tilt with the mouse, soft glowing metaball look.

KIMI K2.7
KIMI K2.7 fluid● LIVE
GLM-5.2
GLM-5.2 fluid● LIVE
OPUS 4.8
OPUS 4.8 fluid● LIVE

▶ all three live — click a panel and play · scroll on for the next test (off-screen builds pause to stay smooth)

GLM-5.2
9
winner · best liquid
OPUS 4.8
7
KIMI K2.7
5

GLM filled the bowl with glowing liquid that actually sloshes — the most convincing "liquid in a bowl". Opus's particles glowed but clumped to the centre. Kimi's collapsed into a tiny blob.

▶ Kimi build▶ GLM build▶ Opus build
Test 4 · landing page

"Introducing Nova 1" landing page.

One shot: a premium Apple-keynote-style launch page for a fictional AI model — hero, features, pricing, scroll reveals.

KIMI K2.7
KIMI K2.7 landing● LIVE
GLM-5.2
GLM-5.2 landing● LIVE
OPUS 4.8
OPUS 4.8 landing● LIVE

▶ all three live — click a panel and play · scroll on for the next test (off-screen builds pause to stay smooth)

GLM-5.2
9
tie · top
OPUS 4.8
9
tie · top
KIMI K2.7
6.5

Funniest result of the lot: GLM and Opus independently produced near-identical premium "Introducing Nova 1 — Intelligence, reimagined / distilled" keynote pages — gradient hero, full nav, pricing tiers. A dead heat. Kimi's was a plainer set of feature cards.

▶ Kimi build▶ GLM build▶ Opus build
Test 5 · arcade game

Juicy neon arcade game.

One shot: a neon arcade game with screen shake, particle explosions, a combo multiplier, sound, and a start + game-over screen.

KIMI K2.7
KIMI K2.7 arcade● LIVE
GLM-5.2
GLM-5.2 arcade● LIVE
OPUS 4.8
OPUS 4.8 arcade● LIVE

▶ all three live — click a panel and play · scroll on for the next test (off-screen builds pause to stay smooth)

OPUS 4.8
8.5
winner · game-feel
GLM-5.2
8
KIMI K2.7
8

All three shipped a genuinely juicy game. Opus's breakout had the most game-feel — particle bursts and a live combo. Kimi's breakout was clean and solid. GLM went its own way with fullscreen neon asteroids. The closest test of the five.

▶ Kimi build▶ GLM build▶ Opus build
The benchmark question

What about official benchmarks?

Straight answer: there aren't clean ones for these exact versions yet. z.ai shipped GLM-5.2 with no scorecard (open weights + standalone API came after), and Moonshot has only published its own proprietary benchmarks for K2.7. So the head-to-head above is the most honest GLM-5.2-vs-K2.7 data going.

Latest verifiable public numbers — SWE-bench Verified

Previous-generation, vendor-reported, directional only. The current 5.2 / 2.7 / 4.8 coding scores are not yet independently published.
GLM-5 (prev gen)
77.8%
Kimi K2.6 (prev gen)
80.2%
Read this before you quote any of it ↓
Run your own

Want all three coders in one place?

Every model here runs inside the Agent OS — Kimi, GLM-5.2 and Claude in one dashboard, one workspace, builds previewing live. Set it up, run your own head-to-heads, keep the winners.

Get the Agent OS →