Meituan just open-sourced a 1.6-trillion-parameter model, trained on non-Nvidia chips. So I did what I always do — I gave it four one-shot game prompts. It built all four playable 3D worlds. Here's every real build, with the screenshots.
"We are introducing and open sourcing LongCat-2.0, a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token… Both the full training run and the large-scale deployment are built entirely on AI ASIC superpods."
— Meituan, LongCat-2.0 announcement, 30 Jun 2026
Every model launch comes with a wall of benchmark bars. I don't trust bars. I trust builds.
So the moment LongCat-2.0 dropped, I ran it through GoldieBench — my open leaderboard where a model gets one prompt to build a real, playable 3D thing, and I score what actually renders.
Four prompts. A frozen open world. A torch-lit dungeon. An open-world explorer. A Minecraft-style sandbox.
LongCat built all four — playable, on the first shot. I didn't just screenshot them. I drove each one: WASD to walk, mouse to look. They all move. Here they are.
Honest note: three of the four rendered perfectly on the first prompt. The voxel world built completely too — but it loaded facing away from the terrain (all sky), so I patched one line to point the camera at the world. I'd rather tell you that than pretend it was flawless. Every score above is a build I watched move.
Here's the launch, plain.
LongCat-2.0 is a Mixture-of-Experts model from Meituan. It has 1.6 trillion total parameters, but only about 48 billion fire per token — so it's huge but efficient to run.
The part that made engineers sit up: it was trained entirely on AI ASIC superpods — not the usual Nvidia GPUs. Over 50,000 accelerators, more than 35 trillion training tokens, no crashes, no do-overs. They proved you can train a frontier model on alternative hardware.
Two clever tricks make it fast on long inputs: LongCat Sparse Attention (a lighter way to handle a 1-million-token context) and an N-gram Embedding that squeezes more out of every parameter. Together they make it strong on the long, multi-step, agentic work that eats normal models alive.
And it's open. Weights on Hugging Face. Wired into Claude Code, OpenClaw, and Hermes out of the box. You can run it, for free.
Robin Delta lays out why this one's a big deal in plain terms. LongCat-2.0 comes from Meituan — China's DoorDash — whose AI team is barely two years old, and it was trained on roughly 50,000 domestic chips with zero Nvidia. The headline he pulls: on SWE-bench Pro it scores 59.5, edging out GPT-5.5's 58.6. And it's open-weight, MIT-licensed and self-hostable — not a locked lab model, one you can actually run. (His thread also notes it's the model behind "Owl Alpha" on OpenRouter — another free way in.)
(Live X embeds need a connection to load — offline they show as a plain link. That's normal.)
On the official benchmarks it's competitive — genuinely frontier-adjacent on agentic and foundational tasks, a step behind Opus 4.8 on the hardest pure-code ones. Here's where it lands, from Meituan's own numbers:
Before
Every launch, I'd read the benchmark bars and feel behind.
The best scores were always on a model I couldn't run — gated, locked, or priced by the token.
I'd bookmark the blog post and get back to work with whatever I had.
The frontier felt like something that happened to other people.
Then open models started shipping that actually build.
After
Now a model like LongCat-2.0 drops and I don't just read about it — I run it that hour.
Four prompts, four playable worlds, on a free open model, scored live on my own leaderboard.
It slots straight into my Agent OS next to every other model I use.
The frontier isn't something I wait for anymore. It's something I plug in.
You can do this too. Open model, one prompt, real thing built.
I'm not going to paste invented quotes here. The wins are real and written by the members themselves — agency owners, ecom founders, course creators, solo operators across 38 countries. Read them in their own words.
Read the 158-page wins doc →You've seen it. An open, free model just built four playable worlds one-shot.
So here's the deal. Before you sleep tonight, open one model you don't already use — LongCat, or any of the free ones — and give it ONE real prompt. Build something. Watch it run.
Because the moment you stop waiting for the perfect gated model and start running the good-enough open ones, the frontier stops being a spectator sport.
The people still reading benchmark bars are getting passed. The people running the models and shipping are pulling ahead every week.
Commit to running one open model tonight. Build one real thing. Start now.
Here's the idea: you don't need the single best model. You need an engine that runs the best model you can actually get — and turns it into finished work. Four parts make it an engine, not just a chat window.
Plug in an open frontier model like LongCat-2.0 — free weights, no per-token meter. The everyday work runs at $0, and nobody can gate it away from you.
Feed the model a real task and get a real artifact back — a working page, a playable build, a shipped output. Not a snippet you finish yourself.
The engine checks the output — does it render, does it run, does it move — and fixes the one line that's off. You get proof, not a promise.
When a better open model ships next month, you drop it in the same socket. Your memory, agents and workflows never change. The model is fuel; the engine is yours.
That's exactly what I did with LongCat: ran it free, one prompt per build, verified every one, and slotted it into the same Agent OS that runs all my other models.
Here's the freeing part.
You were never going to win by having the single highest benchmark score. That model is always gated or metered.
But an open model that builds a real, playable thing on the first prompt? That you can run today, for free. LongCat-2.0 is one. There'll be a better one next month.
What turns any of them into output for your business is the engine around it — the Agent OS. One dashboard that plugs in whatever model you want, wraps it in memory, agents and workflows, and ships the result.
No — that's the biggest myth about it. Agent OS runs the everyday 90% on a free local model on your own machine (nothing leaving it, $0), free open models like LongCat slot in for more, and for the frontier work it drives the CLIs you already pay for — your Claude subscription already includes the Claude CLI, and Agent OS plugs straight into it, so you're not paying twice. It's a layer on top of what you already own, not a new meter. And inside the AI Profit Boardroom there are full token-optimisation tutorials, so you cut usage to the bone and never think about it again.
LongCat-2.0 is one open model that builds. The Agent OS is the engine that runs it — and every model after it — and turns it into finished work. Here's everything inside.
You're not buying a tool. You're getting the whole operating system I run a seven-figure business on — the one that turns whatever open model wins this month into your business's output.
Get the Agent OS → Inside the AI Profit Boardroom · skool.com/ai-profit-labWrong: "Open models are toys. Only the closed frontier ones can build real things."
Right: An open 1.6T model just built four playable 3D worlds one-shot on my bench, averaging 8.12/10 — above Opus 4.8's average on the same tasks. Open caught up.
Wrong: "I need the model with the highest benchmark score to compete."
Right: The highest score is always gated or metered. The model that ships a real thing on the first prompt — and runs free — beats a benchmark you can't touch.
Wrong: "This is for engineers. I couldn't run a new model or test it myself."
Right: I ran LongCat through a free web chat and one prompt per build. The engine does the verifying. You just give it the task and watch it build.
158 pages of members already running open + frontier models inside one Agent OS — agency owners, ecom founders, course creators, solo operators across 38 countries. Real businesses, real wins, in their own words.
Read the 158-page wins doc →You can read this week's launch like every other — a wall of benchmark bars on a model someone else gets to use.
Or you can read it as the proof that the open models are here, they build real things, and they run free.
The people who run the new models the day they drop — and slot them into one engine — are going to be miles ahead of the people still reading launch posts. Every model you test compounds.
The model is fuel. The engine is yours. Go run one.
LongCat-2.0 is open. A 1.6T MoE, ~48B active, trained on non-Nvidia ASIC superpods — free weights.
It actually builds. 4 one-shot GoldieBench game prompts → 4 playable worlds, avg 8.12/10.
Open caught the frontier. Top on instruction-following + science; a notch behind Opus on hard code — but free.
Run models, don't read bars. The launch post is a spectator sport; the prompt is the real test.
Build the Open Frontier Engine. One Agent OS that runs any open model and turns it into finished work.
The model is fuel. The engine is yours.
LongCat built four playable worlds for me on the first prompt, for free. The only question is whether you're running the open models — or still reading about them.
Inside the AI Profit Boardroom you get the full Agent OS — the Open Frontier Engine, built. The same dashboard that plugs open models like LongCat, free local models, and every CLI you already pay for into one system with shared memory, agents and workflows, plus GoldieBench so you always know which model actually builds. You get the zip file, every prompt, the memory setup, and coaching calls where we wire it in together, step by step. 3,600+ founders across 38 countries are building inside it right now, and every new open model — LongCat and whatever beats it next month — just slots in and makes the whole thing stronger. Stop reading launch posts. Start shipping.
Get the Agent OS → Inside the AI Profit Boardroom · skool.com/ai-profit-labSet up in an afternoon · used in 38 countries · every new model tested the week it ships. I'll see you inside.