New framework · a free AI coder on your Mac The framework · pick the right local model

The Local Code Engine.

A free AI that builds apps, tools and games for you — running 100% on your own Mac. The trick is picking the right model, not the biggest.

SPEED ON MY MAC · WORDS PER SECOND bigger bar = faster. Watch the heaviest model come dead last. North Mini Code 19 GB · MoE 92 ⚡ gpt-oss:20b 12 GB · MoE 75 Qwen-Coder 14B 9 GB · dense 36 Qwen-Coder 32B 28 GB · dense 🐢 16 The 28 GB model is the slowest. The 19 GB model is the fastest.
Real numbers from my Mac. Size is the RAM it uses — not the speed. Speed comes from how much of the model fires per word.
Open everything · the real tools (all free)
Everything you need to run this ↓
3,600+Founders inside AIPB
400kYouTube subscribers
163kX / Twitter followers
38Countries · live members

For months I paid for every single AI message.

So I tried running a model on my own Mac, for free.

The first one felt slow, so I grabbed a bigger one.

The bigger one was worse — it ate all my memory and crawled.

I figured local AI just wasn't fast enough, and gave up.

Then I learned the one thing nobody tells you: the biggest model is not the fastest.

I picked the right-shaped model instead — light, but smart.

Now there's a free AI coder living inside my Mac.

It builds apps, tools and games for me — offline, instant, no bill.

I just say what I want, and watch it appear. That changed everything.

The framework · the Local Code Engine

The Local Code Engine™.

A free AI coder that lives inside your own Mac — fast, private, always on. Five things make it work.

i.

The Right Model

Not the biggest — the right shape. A light model that only fires a small part of itself per word, so it's quick AND smart on your machine.

ii.

Always Warm

Keep it sitting in your Mac's memory so it never reloads. The lag people blame on "slow models" is just the cold start — kill it once and it's instant.

iii.

Nothing Leaves

It runs on a loop inside your Mac. Your code, your ideas, your prompts — none of it ever touches the internet. Truly private, truly free.

iv.

Build By Voice

Say "build me a snake game" and it writes the whole thing — then shows it to you running, live. Talk, watch it appear.

v.

It All Saves

Every app, tool and game it builds lands in one workspace on your Mac. Open it, play it, download it — nothing gets lost.

Old way vs new way

Renting an AI vs owning one.

Here's the shift. Most people rent AI by the month and send every keystroke to someone else's server. This is the other way.

Old way · renting the cloud
$20–200/mo
  • Pay a monthly AI bill, forever
  • Every word you type leaves your machine
  • Hit rate limits right when you're in flow
  • Try a local model, grab the biggest one, watch it crawl
  • Give up and decide local AI is "too slow"
  • Result: you rent your tools and hand over your data
New way · the Local Code Engine
$0/mo
  • Pick a light, fast model — it builds at ~75–92 words a second
  • Pin it warm once, so it never reloads or lags
  • No bill, no tokens, no limits — build all day
  • Nothing you type ever leaves the Mac
  • Say the word and a whole app appears, live
  • Result: a fast, private AI coder you actually own
The one lesson · size isn't speed

Why the smaller model can be the fast one.

This is the bit that took me weeks to figure out.

Everyone assumes a bigger model is a slower model. It's not.

The size of a model is just how much memory it takes up.

The speed is something else entirely: how much of the model has to think for every single word.

An old-style "dense" model wakes up its whole brain for every word. Big and heavy means slow.

A new "MoE" model is a giant brain split into experts — and only a few experts wake up per word. Big brain, tiny effort. Fast.

WHAT FIRES PER WORD DENSE model every part fires · heavy + slow "16 words / sec" MoE model only a few fire · light + fast "92 words / sec"
Same idea both sides: a brain doing work. The dense one wakes all of it per word. The MoE wakes a few experts. That's the whole speed difference.

So the rule is simple: on a Mac, pick a small-but-mighty MoE model that fits in memory with room to spare. Mine is gpt-oss:20b — 12 GB, ~75 words a second, free. The new North Mini Code (Cohere) is even quicker at ~92. Skip the giant dense models — they swap your Mac and crawl.

The other trick · keep it warm

It was never slow. It was cold.

Even the right model feels slow if you do one thing wrong.

You let it fall asleep between messages.

Then your first question has to drag the whole model back off the disk before it can answer.

That few-second lag is what makes people quit. It's not the thinking — it's the loading.

One setting tells the model to stay in memory. After that, it answers the instant you hit enter.

# pin the model warm so it never reloads
$ ollama ps
NAME SIZE PROCESSOR UNTIL
gpt-oss:20b 12 GB 100% GPU warm ← ready · answers instantly
The proof · nothing leaves

I didn't take its word for it. I checked.

"Offline" is easy to say. So I tested it.

The model only listens on one address — the loop inside your own Mac.

Then I watched every connection it made while it built a whole app.

There were none. Not one byte went out.

THE SEALED MACHINE YOUR MAC "build me X"what you say The Code Engineruns on your GPU A built appback to you 🌐 The internet 0 connections · ever Your prompt, the model, the app — all inside the box. The line out is cut. 🔒
Everything happens on your Mac. The internet is never touched.

The actual test — where it listens, and every connection it made while building:

# where does it listen?
$ lsof -iTCP -c ollama
ollama … TCP 127.0.0.1:11434 (LISTEN) ← loopback only — your machine

# while it built a full app, every connection that ISN'T loopback:
$ lsof -iTCP -c ollama | grep -v 127.0.0.1
(nothing) ← zero. not one byte left the Mac.
Build by voice · live

Say it. Watch it appear.

This is the part that feels like the future.

You don't even have to type.

Tap the mic and say "build me a snake game."

The engine writes the whole thing — all the code, in one shot.

It shows up live in a preview, right there, running.

And it's saved in your workspace, next to everything else it's built.

🎙️Speak it It writes the appone complete file Previews liverunning, right there Saved in workspace ✓open it anytime
Speak → it writes → it previews live → it saves. All on your Mac.

A real run — one sentence in, a working game out:

you › build a neon snake game
engine › Here's a self-contained Snake ↓
<!DOCTYPE html> … arrow keys · food · score · game over · restart
→ previewed live · saved to your workspace · 75 words/sec
Set it up with us

Want your own Local Code Engine?

I wired this into the Agent OS — the dashboard where all my agents live on one screen. The Local Code Engine sits right next to the cloud ones, as the free, instant one for quick builds.

Get the Agent OS →

link in the description ↗

Do it yourself · the setup

Wire it in — about ten minutes.

You need a Mac with a decent chunk of memory (16 GB is plenty for the model below — it sits around 12 GB warm). Then four steps.

Install Ollama.

It's a free app that runs AI models on your own machine. One download from ollama.com.

Pull the right model — once.

A light, fast MoE coder. This downloads it to your disk; after that it's yours forever, offline.

Keep it warm.

One setting tells the model to stay in memory instead of reloading every time. This is what turns "slow" into "instant."

Wake it once.

Send it a quick "hello" so it loads into memory. Check it's warm — and you're live.

# 1. install Ollama from https://ollama.com (free)

# 2. pull a light, fast MoE coder (then it's offline forever)
ollama pull gpt-oss:20b

# 3. the trick — keep it warm so it never reloads
launchctl setenv OLLAMA_KEEP_ALIVE 30m

# 4. wake it once so it loads into RAM
ollama run gpt-oss:20b "hello"

# check it's warm:
ollama ps
gpt-oss:20b 12 GB 100% GPU warm ← done · ~75 words/sec

Want the fastest coder out right now? Try Cohere's new north-mini-code-1.0 — ~92 words a second, made for building (needs the latest Ollama). Got less memory? llama3.2:3b (~5 GB) is tiny and quick for the simple stuff.

Should you do this · what holds people back

The three things people get wrong — backwards.

Belief: "Local AI is too slow to be useful."

Truth: Only if you pick the wrong model or let it go cold. The right MoE model builds at 75–92 words a second on a normal Mac — faster than you can read. I proved it: the lag was the cold start, never the thinking.

Belief: "I need the biggest model to get good results."

Truth: Backwards. On my Mac the 28 GB model was the slowest of all — it swapped memory and crawled at 16 words a second. The 12 GB MoE was 5× faster AND built better. Right shape beats big size, every time.

Belief: "Setting up a local model is too technical for me."

Truth: It's one app install and three short commands. If you can copy and paste a line, you can run this. The hard part — making it fast — is a single setting. Ten minutes, start to finish.

Don't take my word for it

3,600+ founders are wiring fast local agents like this inside the Boardroom right now. Their wins — real businesses, real results — are documented here.

Read the member wins →
Recap · what you walk away with

What you gain.

i.

You stopped paying. A free AI coder runs on the Mac you already own — that monthly AI cost is gone.

ii.

You stopped guessing on models. Pick a light MoE, not the biggest — fast AND smart, no swapping.

iii.

You stopped waiting. Keep it warm and it answers the instant you hit enter — ~75–92 words a second.

iv.

You stopped leaking. Nothing you type ever leaves your Mac — proven, zero connections out.

v.

You started building by voice. Say it, watch a whole app appear live, and it saves itself.

vi.

You own it. No bill, no limits, no one watching — your own coder, on your own machine.

Your turn

Put a free, fast AI coder inside your own Mac.

You've seen it. The right model, kept warm, runs offline and builds whole apps from a sentence. If you want it set up with you — step by step, right next to your cloud agents — it's all inside the AI Profit Boardroom.

Get the Agent OS →

I'll see you inside ↗