LongCat-2.0 · open 1.6T · benchmarked on GoldieBench · 30 Jun 2026

The Open Frontier Engine + LongCat-2.0.

Meituan just open-sourced a 1.6-trillion-parameter model, trained on non-Nvidia chips. So I did what I always do — I gave it four one-shot game prompts. It built all four playable 3D worlds. Here's every real build, with the screenshots.

A glowing emerald-green machine engine on a dark frontier plain, beams of green and gold light constructing four small low-poly game worlds around it — a snowy mountain realm, a torch-lit dungeon, a green highland with a watchtower, and a floating island of cube blocks
I put LongCat-2.0 on GoldieBench · 4 one-shot builds · my 0–10 rubric · render-verified
Dragon Realm — snow open world8.5
Skyrim — open-world explorer8.5
Crypt — torch-lit dungeon8.0
Voxel Craft — Minecraft-style7.5
Four for four. Average 8.12/10 — a first appearance that lands it near the top of the board, above Opus 4.8's average on these tasks. Every score is a build I watched render AND drove with WASD + mouse.
1.6T
parameters (~48B active/token)
4 / 4
playable builds, one-shot
$0
open weights · free to run
Straight from the source — read it + run it yourself ↓

"We are introducing and open sourcing LongCat-2.0, a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token… Both the full training run and the large-scale deployment are built entirely on AI ASIC superpods."

— Meituan, LongCat-2.0 announcement, 30 Jun 2026

I · the proof

I gave it four game prompts. It built all four.

Every model launch comes with a wall of benchmark bars. I don't trust bars. I trust builds.

So the moment LongCat-2.0 dropped, I ran it through GoldieBench — my open leaderboard where a model gets one prompt to build a real, playable 3D thing, and I score what actually renders.

Four prompts. A frozen open world. A torch-lit dungeon. An open-world explorer. A Minecraft-style sandbox.

LongCat built all four — playable, on the first shot. I didn't just screenshot them. I drove each one: WASD to walk, mouse to look. They all move. Here they are.

LongCat's Dragon Realm build — a snowy open world with snow-capped mountains, low-poly pines, falling snow and a glowing first-person sword
Dragon Realm8.5
Snow open world · snow-capped mountains, 30 pines, 3000 snow particles, a glowing first-person sword. Flawless first try.
▶ play the real build ↗
LongCat's Skyrim build — a rolling green highland with snow mountains, a stone watchtower, conifers and boulders
Skyrim8.5
Open-world explorer · rolling terrain, snow mountains, a stone watchtower, conifers, boulders, clouds. Terrain-follow camera. The richest of the four.
▶ play the real build ↗
LongCat's Crypt build — a torch-lit stone dungeon corridor with pillars, barrels and a chest
Crypt8.0
Torch-lit dungeon · stone corridor, pillars, barrels, a chest, 6+ flickering torch lights. Lit and moody, playable first try.
▶ play the real build ↗
LongCat's Voxel Craft build — a Minecraft-style world of grass-topped cube blocks with voxel trees under a blue sky
Voxel Craft7.5
Minecraft-style sandbox · grass/dirt/stone cubes, voxel trees, day/night, break + place blocks. Built one-shot; needed a one-line camera fix.
▶ play the real build ↗

Honest note: three of the four rendered perfectly on the first prompt. The voxel world built completely too — but it loaded facing away from the terrain (all sky), so I patched one line to point the camera at the world. I'd rather tell you that than pretend it was flawless. Every score above is a build I watched move.

One prompt LongCat-2.0open · 1.6T · runs free ❄ Dragon Realm 8.5 ⛰ Skyrim 8.5 🔥 Crypt 8.0 · ⛏ Voxel 7.5
One prompt into the open engine → four playable worlds out. That's the test that matters — not the benchmark bar.
II · what it is

An open 1.6-trillion-parameter model — built on the wrong chips.

Here's the launch, plain.

LongCat-2.0 is a Mixture-of-Experts model from Meituan. It has 1.6 trillion total parameters, but only about 48 billion fire per token — so it's huge but efficient to run.

The part that made engineers sit up: it was trained entirely on AI ASIC superpods — not the usual Nvidia GPUs. Over 50,000 accelerators, more than 35 trillion training tokens, no crashes, no do-overs. They proved you can train a frontier model on alternative hardware.

Two clever tricks make it fast on long inputs: LongCat Sparse Attention (a lighter way to handle a 1-million-token context) and an N-gram Embedding that squeezes more out of every parameter. Together they make it strong on the long, multi-step, agentic work that eats normal models alive.

And it's open. Weights on Hugging Face. Wired into Claude Code, OpenClaw, and Hermes out of the box. You can run it, for free.

the reaction · the story

A food-delivery company, no Nvidia, beating GPT-5.5

Robin Delta lays out why this one's a big deal in plain terms. LongCat-2.0 comes from Meituan — China's DoorDash — whose AI team is barely two years old, and it was trained on roughly 50,000 domestic chips with zero Nvidia. The headline he pulls: on SWE-bench Pro it scores 59.5, edging out GPT-5.5's 58.6. And it's open-weight, MIT-licensed and self-hostable — not a locked lab model, one you can actually run. (His thread also notes it's the model behind "Owl Alpha" on OpenRouter — another free way in.)

(Live X embeds need a connection to load — offline they show as a plain link. That's normal.)

On the official benchmarks it's competitive — genuinely frontier-adjacent on agentic and foundational tasks, a step behind Opus 4.8 on the hardest pure-code ones. Here's where it lands, from Meituan's own numbers:

LongCat-2.0 vs Opus 4.8 · official scores (Meituan blog) · 0–100
IFEval — instruction following · LongCat90.0
GPQA-diamond — hard science · LongCat88.9
BrowseComp — web agent · LongCat79.9
SWE-bench Multilingual · LongCat77.3
SWE-bench Multilingual · Opus 4.884.8
Terminal-Bench 2.1 · LongCat70.8
Terminal-Bench 2.1 · Opus 4.878.9
Top of the board on instruction-following and hard science. A notch behind Opus 4.8 on the toughest code benchmarks. But Opus you rent by the token — LongCat you own. And on my build test, LongCat matched the frontier where it counts: shipping a real, playable thing first try.
Your prompt MoE routerpicks the experts ~48B of 1.6T fire+ 1M-ctx sparse attn A built thingpage · game · code
Huge but efficient: the router fires only ~48B of the 1.6T parameters per token, and LongCat Sparse Attention keeps a 1M-token context cheap — so it stays sharp across long, multi-step builds.
my story · why this matters

I stopped waiting for the perfect model. I test what I can run.

Before

Every launch, I'd read the benchmark bars and feel behind.

The best scores were always on a model I couldn't run — gated, locked, or priced by the token.

I'd bookmark the blog post and get back to work with whatever I had.

The frontier felt like something that happened to other people.

Then open models started shipping that actually build.

After

Now a model like LongCat-2.0 drops and I don't just read about it — I run it that hour.

Four prompts, four playable worlds, on a free open model, scored live on my own leaderboard.

It slots straight into my Agent OS next to every other model I use.

The frontier isn't something I wait for anymore. It's something I plug in.

You can do this too. Open model, one prompt, real thing built.

3,600+founders inside AIPB
400kYouTube subscribers
38countries · live members
163kX / Twitter followers
Real people · real wins

I'm not going to paste invented quotes here. The wins are real and written by the members themselves — agency owners, ecom founders, course creators, solo operators across 38 countries. Read them in their own words.

Read the 158-page wins doc →
before you scroll on —

Commit to running one open model today.

You've seen it. An open, free model just built four playable worlds one-shot.

So here's the deal. Before you sleep tonight, open one model you don't already use — LongCat, or any of the free ones — and give it ONE real prompt. Build something. Watch it run.

Because the moment you stop waiting for the perfect gated model and start running the good-enough open ones, the frontier stops being a spectator sport.

The people still reading benchmark bars are getting passed. The people running the models and shipping are pulling ahead every week.

Commit to running one open model tonight. Build one real thing. Start now.

the framework

The Open Frontier Engine™.

Here's the idea: you don't need the single best model. You need an engine that runs the best model you can actually get — and turns it into finished work. Four parts make it an engine, not just a chat window.

i.

Run open, run free

Plug in an open frontier model like LongCat-2.0 — free weights, no per-token meter. The everyday work runs at $0, and nobody can gate it away from you.

ii.

One prompt, one finished thing

Feed the model a real task and get a real artifact back — a working page, a playable build, a shipped output. Not a snippet you finish yourself.

iii.

Verify, don't trust

The engine checks the output — does it render, does it run, does it move — and fixes the one line that's off. You get proof, not a promise.

iv.

Swap the engine, keep the machine

When a better open model ships next month, you drop it in the same socket. Your memory, agents and workflows never change. The model is fuel; the engine is yours.

That's exactly what I did with LongCat: ran it free, one prompt per build, verified every one, and slotted it into the same Agent OS that runs all my other models.

III · what this means for you

You don't need Opus. You need an engine.

Here's the freeing part.

You were never going to win by having the single highest benchmark score. That model is always gated or metered.

But an open model that builds a real, playable thing on the first prompt? That you can run today, for free. LongCat-2.0 is one. There'll be a better one next month.

What turns any of them into output for your business is the engine around it — the Agent OS. One dashboard that plugs in whatever model you want, wraps it in memory, agents and workflows, and ships the result.

"Doesn't running Agent OS burn a fortune in tokens?"

No — that's the biggest myth about it. Agent OS runs the everyday 90% on a free local model on your own machine (nothing leaving it, $0), free open models like LongCat slot in for more, and for the frontier work it drives the CLIs you already pay for — your Claude subscription already includes the Claude CLI, and Agent OS plugs straight into it, so you're not paying twice. It's a layer on top of what you already own, not a new meter. And inside the AI Profit Boardroom there are full token-optimisation tutorials, so you cut usage to the bone and never think about it again.

old way vs new way

Two ways to meet a new model.

Reading the benchmarks
always behind
  • Scroll the launch post's benchmark bars
  • Feel behind because the best score is gated or metered
  • Bookmark it, get back to work with the old model
  • Never actually run the new one
  • Repeat next launch, still spectating
Running the model
shipping same day
  • Open the free model the hour it drops
  • Give it one real prompt — build a real thing
  • Verify it renders, runs, and moves
  • Slot it into your engine next to the rest
  • Every new open model is a free upgrade
get the engine

Get the Agent OS — the engine every model plugs into.

LongCat-2.0 is one open model that builds. The Agent OS is the engine that runs it — and every model after it — and turns it into finished work. Here's everything inside.

Plug in any model — open like LongCat, free local, or the CLIs you already pay for, one dashboard
The Local Hermes Engine — a free offline model that builds while you watch, $0
GoldieBench — my live leaderboard so you always know which model actually builds
Agent Kanban — Planner → Builder → Reviewer, agents that do the work
The Loop Engineer — define "done", it verifies and loops until it passes, no QC from you
Every CLI you already pay for — Claude, Codex, Gemini, Kimi, GLM, Grok in one place
The Claude Workspace — every output saved and previewed, nothing lost
Memory + vault that knows your business cold, every session
Token-optimisation playbooks so it runs cheap, on tools you already own
3,600+ founders + me shipping daily — every new model tested the week it lands

You're not buying a tool. You're getting the whole operating system I run a seven-figure business on — the one that turns whatever open model wins this month into your business's output.

Get the Agent OS → Inside the AI Profit Boardroom · skool.com/ai-profit-lab
three beliefs to drop

What's holding you back.

Wrong: "Open models are toys. Only the closed frontier ones can build real things."

Right: An open 1.6T model just built four playable 3D worlds one-shot on my bench, averaging 8.12/10 — above Opus 4.8's average on the same tasks. Open caught up.

Wrong: "I need the model with the highest benchmark score to compete."

Right: The highest score is always gated or metered. The model that ships a real thing on the first prompt — and runs free — beats a benchmark you can't touch.

Wrong: "This is for engineers. I couldn't run a new model or test it myself."

Right: I ran LongCat through a free web chat and one prompt per build. The engine does the verifying. You just give it the task and watch it build.

Don't take my word for it

158 pages of members already running open + frontier models inside one Agent OS — agency owners, ecom founders, course creators, solo operators across 38 countries. Real businesses, real wins, in their own words.

Read the 158-page wins doc →
where this leaves you

Stop reading bars. Start running models.

You can read this week's launch like every other — a wall of benchmark bars on a model someone else gets to use.

Or you can read it as the proof that the open models are here, they build real things, and they run free.

The people who run the new models the day they drop — and slot them into one engine — are going to be miles ahead of the people still reading launch posts. Every model you test compounds.

The model is fuel. The engine is yours. Go run one.

the recap

The whole thing in five lines.

i.

LongCat-2.0 is open. A 1.6T MoE, ~48B active, trained on non-Nvidia ASIC superpods — free weights.

ii.

It actually builds. 4 one-shot GoldieBench game prompts → 4 playable worlds, avg 8.12/10.

iii.

Open caught the frontier. Top on instruction-following + science; a notch behind Opus on hard code — but free.

iv.

Run models, don't read bars. The launch post is a spectator sport; the prompt is the real test.

v.

Build the Open Frontier Engine. One Agent OS that runs any open model and turns it into finished work.

The model is fuel. The engine is yours.

your move

Run the open frontier. Own the engine.

LongCat built four playable worlds for me on the first prompt, for free. The only question is whether you're running the open models — or still reading about them.

Inside the AI Profit Boardroom you get the full Agent OS — the Open Frontier Engine, built. The same dashboard that plugs open models like LongCat, free local models, and every CLI you already pay for into one system with shared memory, agents and workflows, plus GoldieBench so you always know which model actually builds. You get the zip file, every prompt, the memory setup, and coaching calls where we wire it in together, step by step. 3,600+ founders across 38 countries are building inside it right now, and every new open model — LongCat and whatever beats it next month — just slots in and makes the whole thing stronger. Stop reading launch posts. Start shipping.

Get the Agent OS → Inside the AI Profit Boardroom · skool.com/ai-profit-lab

Set up in an afternoon · used in 38 countries · every new model tested the week it ships. I'll see you inside.