Local AI · tested + benchmarked on my Mac

I tested the viral "Qwable 27B Coder" locally. Here's the truth.

It's the model its own author calls a hype cautionary tale — 10 training examples, three minutes of training. Ollama can't even run it. So I ran it on Apple MLX, benchmarked it, and had it build games. What came out surprised me.

A glowing emerald low-poly dragon coiled around a luminous silicon chip with a neon synthwave racetrack and retro sun — representing Qwable running locally on Apple silicon
27Bparameters (Qwen3.6)
~18tokens/sec · MLX · M4 Max
8.3top GoldieBench build
$0100% local
Straight from the source

Everything here is real — read it + run it yourself.

I'm not repeating hype. I pulled the model, ran it locally, and scored what it built. Here are the actual sources:

"Providing impressive-looking numbers without methodology is part of what this release is criticizing… it is not recommended as a production coding model." — DJLougen, the Qwable-5-27B-Coder model card (the author deliberately withheld all benchmarks)
My story · why this matters

I kept chasing hyped local models. Then I built a way to just test them.

Before

Every week a new "insane" local AI drops with a wild name.

I'd get excited, try to run it, and hit a wall — half of them won't even load in Ollama.

The ones that did run, I had no real way to tell if they were actually any good.

So I was either believing the hype or ignoring it. Both felt wrong.

Then I stopped guessing and built a system that runs + scores any model.

After

Now a hyped model drops and I just pull it, run it locally, and score what it builds.

If Ollama can't load it, my Agent OS Local engine runs it on Apple MLX instead.

I had this "joke" 27B build two real 3D games in an afternoon, then ranked it on my own leaderboard.

No hype. No faith. Just the receipts.

You can do this too. Same tools. Same path.

The receipts

Real people. Real wins. Inside the Boardroom right now.

Here's what's happening for the operators already running this stack — agency owners, ecom founders, course creators, solo builders. Different businesses, same result.

3,600+Founders inside AIPB
400kYouTube subscribers
38Countries · live members
163kX / Twitter followers

I'm not going to paste invented quotes here. The wins are real and written by the members themselves — across 38 countries — so read them in their own words.

Read the 158-page wins doc →
Before you scroll on —

Commit to testing, not believing.

You've seen the hype cycle. A model drops, everyone calls it insane, nobody actually checks.

The next 10 minutes show you how I pull any model, run it locally for free, and score it myself.

So here's the deal. Promise yourself one thing right now: the next time a local AI goes viral, you're going to RUN it before you trust it.

The people who test get the real answer. The people who repeat the hype get burned.

Be one of the people who tests.

Commit to testing today. This changes how you pick every tool from now on.

What it actually is

A 27B model trained in three minutes on ten examples.

Let's be straight about what Qwable-5-27B-Coder is, because the name oversells it.

It's Qwen3.6-27B — a strong open base model from Alibaba — with a full fine-tune on just 10 traces (5 from Fable 5, 5 from Kimi 2.7 Coder), trained in about three minutes on a single machine.

The author didn't hide this. They put it front and centre and even refused to publish benchmarks — on purpose — to make a point about AI hype. Their own words: it's "not recommended as a production coding model."

So this is basically Qwen3.6-27B wearing a flashy "Qwable Coder" jacket. That matters for one big reason: anything good it does is the base model being good — not the branding.

So is it a scam?

No — and that's the interesting part. It's an honest experiment about marketing vs substance. The 10-trace fine-tune barely changed the model (the quality drift is near zero). Which means you're really testing whether a free, open 27B base can do real work on your own Mac. Spoiler: it can — it's just dressed up in hype it doesn't need.

The catch nobody mentions: where it can actually run
Qwen3.6 is so new that the usual local runtimes reject it — only Apple MLX loads it
Qwable GGUF Qwen3.6 · qwen35 arch Ollama ✗ unknown arch llama.cpp ✗ same error Apple MLX ✓ runs it natively mlx_lm.server

This is the real lesson for running local AI in 2026: for brand-new models, MLX (Apple's own framework) often works when Ollama and llama.cpp can't yet. I wired MLX straight into my Agent OS Local engine so it just works.

The benchmarks · I ran them myself

I gave it the exact same build tasks as the frontier models. It held its own.

I run a public leaderboard, GoldieBench, where every model gets the same one-prompt build challenges and a 0–10 score for what it actually ships. So I put Qwable through it on my Mac.

For a "10-trace, 3-minute, don't-use-this" model, the scores are genuinely good — because that 27B base is no joke:

Qwable on GoldieBench — what it scored
same prompts, same 0–10 scale as Opus, GLM, Fusion · my own scoring
Synthwave racer
8.3
Dragon flight
8.0
Landing page
7.2

8.3 is task-winner territory — the same band the frontier cloud models land in. From a free model running on a laptop.

But there's a flip side, and I won't hide it: it's slow, and it over-thinks. It's a 27B "thinking" model, so it reasons for a long time before answering. I watched it burn 900 tokens of reasoning on a one-sentence question. Each game took about five minutes to build.

Speed — the honest trade-off
generation speed on my M4 Max · bigger model = slower
Qwable 27B (MLX)
~18/s
Qwythos 9B (Ollama)
~52/s

A 9B local model runs ~3× faster. Qwable trades speed for the smarts of a much bigger brain. Pick the right tool for the job.

Qwable, by the numbers
measured on an Apple M4 Max via mlx-lm · MLX 4-bit
0parameters
0generation speed
0top build score
0dollars, ever
What it built · open them live

Two real 3D games + a real website. One prompt each.

This is the proof. No editing, no second prompt, no fixing its code. I asked once, and these came out — running, playable, on my own machine. Click any of them.

Three real things. One model. $0. Nothing left my Mac. All ranked live on GoldieBench →

"But abstract 3D stuff always looks broken from local models."

Often true — and Qwable proves it cuts both ways. I asked it for an abstract particle "wormhole" and got a black screen. But ask it for a concrete scene — a dragon, a road, a car, a landing page — and the strong base shines. The lesson: give a local model a clear, recognisable target, not vibes.

Old way vs new way

Chasing the hype vs just testing it.

The old way
~ forever
  • See a viral "insane new local AI" post
  • Try to run it — half won't load in Ollama
  • Give up, or run it with no idea if it's good
  • Believe the hype, or ignore it — both blind
  • Pay for a cloud model anyway, just in case
  • Result: noise, no real answer, FOMO
The new way
~ an afternoon
  • Pull any model — even brand-new ones
  • If Ollama can't load it, MLX runs it for free
  • Wire it into the Agent OS Local engine in one line
  • Give it the same build tasks as the frontier models
  • Score what it actually ships — no faith required
  • Result: the real answer, the receipts, $0 spent
The real lesson

The model name doesn't matter. The system does.

Here's what this whole experiment proves, and it's the thing I keep repeating.

A "Qwable Coder" that's really just Qwen3.6-27B can build an 8/10 game. A 9B "Qwythos" can barely manage 3/10 on the same tasks. The fancy names told you nothing — only running them did.

So stop shopping for the magic model. There isn't one. The edge isn't the model you pick this week — it's the system you plug every model into: one that runs any of them, remembers your business, gives them your tools, and ships the output. That's the Agent OS.

One honest caveat. Qwable is a fun, capable local model for concrete builds — but it's slow, it over-thinks, and its branding promises things the 10-trace fine-tune didn't deliver. Treat it as "Qwen3.6-27B you can run free on MLX," not a miracle coder. Use it for what it's good at. The base model + the author's experiment are credited on the model card.
Doesn't running the Agent OS burn a fortune in tokens?

No — that's the biggest myth about it. Agent OS runs the everyday 90% on a free local model (on your own machine, $0, nothing leaving it — exactly like Qwable here), free APIs slot in for more, and for the frontier work it drives the CLIs you already pay for — your Claude subscription already includes the Claude CLI, and Agent OS plugs straight into it, so you're not paying twice. It's a layer on top of what you already own, not a new meter. And inside the AI Profit Boardroom there are full token-optimisation tutorials, so you learn to cut usage to the bone.

The shortcut

Get the system, not just the model.

Qwable is one model. The Agent OS is the operating system that runs any model — local or frontier — from one dashboard. Here's what's inside:

The Local Hermes Engine — run free local models (and MLX models like Qwable) offline, $0
Every CLI you already pay for — Claude, Codex, Gemini, Kimi, GLM wired into one place
Agent Kanban — Planner → Builder → Reviewer agents that ship work for you
The Workspace — every build saved + previewed live, nothing lost
GoldieBench-style testing — so you pick tools on receipts, not hype
Memory of your business — your goals, clients + voice, every session
Token-efficiency playbooks — run the whole thing for next to nothing
3,600+ founders + me — shipping daily, new tools added the week they drop

You're not buying a tool. You're getting the whole operating system I run a seven-figure business on.

Get the Agent OS →
Inside the AI Profit Boardroom · skool.com/ai-profit-lab
Three beliefs to drop

What's actually holding you back.

Wrong: "I need to find the one best AI model and stick with it."

Right: There is no one best model — they leapfrog weekly. The win is a system that swaps any of them in and tests which is best for the job.

Wrong: "If a model is hyped and has a cool name, it must be good."

Right: A "Qwable Coder" is just Qwen3.6 with a jacket on. The only way to know if a model is good is to run it on real work and score it. Names lie. Receipts don't.

Wrong: "Running powerful AI locally costs a fortune."

Right: Qwable built two 3D games on my laptop for $0. Free local models do the everyday 90%, and your existing CLIs cover the rest. The cost myth is just a myth.

Don't take my word for it

158 pages of members who stopped chasing hype and started shipping — real businesses, real wins.

Read the 158-page testimonials doc →
The recap

What you just learned.

i.
You stopped trusting names.

Qwable is Qwen3.6-27B with hype branding — the base does the work.

ii.
You learned MLX.

When Ollama + llama.cpp can't load a new model, Apple MLX runs it on your Mac.

iii.
You saw the receipts.

Two 3D games (8.3 + 8.0) and a landing page (7.2), one prompt each, $0.

iv.
You know the trade-off.

A 27B is slow (~18 tok/s) and over-thinks — great smarts, not great speed.

v.
You got the real lesson.

The model doesn't matter. The system you plug it into does.

vi.
You can test, not believe.

Pull any model, run it free, score it — the Agent OS way.

Stop chasing the magic model. Build the system that runs them all.
Your move

Run any model. Test everything. Pay almost nothing.

You watched a free 27B build two real games on a laptop. Imagine that power wired into one system that knows your business, runs your tools, and ships the work — that's the Agent Operating System inside the AI Profit Boardroom.

You get the full Local engine (run Qwable + any local or MLX model offline), every CLI you already pay for in one dashboard, the Planner→Builder→Reviewer Kanban, the live-preview Workspace, the memory vault, token-efficiency playbooks, and 3,600+ founders building alongside you — with new tools added the week they drop.

Set it up in an afternoon. Then never chase hype again.

Get the Agent OS →
Inside the AI Profit Boardroom · used in 38 countries · new tools added every week