Qwythos-9B — a free, private Claude-Mythos AI running on your own Mac

Straight from the source

Pull it yourself — it's one line.

Qwythos-9B is free and open on Ollama. Here's the model page, the base it's built on, and the abliteration library behind it:

Official sources ↓

the model · pull itOllama · qwythos-9b-abliterated → the runnerOllama · run models locally → the base modelempero-ai · Qwythos / Claude-Mythos → the abliterationHeretic by p-e-w → more local modelsOllama model library →

"A Claude-style creative & reasoning model (Qwen3.5-9B base) … thinking model, function-calling capable, 1M-token context." — Qwythos-9B model card, ollama.com/richardyoung/qwythos-9b-abliterated

I tested it · what it built

I didn't just install it. I put it to work.

I asked Qwythos to build real things — using nothing but my own Mac. No cloud. No API. No internet needed. Here's exactly what came out, and how fast it ran. Click either one — they're live, single HTML files the model wrote start to finish.

A glowing neon digital clock Qwythos built, running 100% locally

Built by Qwythos · 45.9 tok/s · 100% local A live neon clock One prompt → a complete animated clock in a single 3.8 KB file. Built offline, on my Mac. open it live →

A dopamine to-do app Qwythos built, with progress bar and confetti

Built by Qwythos · 43.2 tok/s · 100% local A dopamine to-do app Add tasks, check them off with confetti, a live progress bar, saved on your machine. One prompt, one file. open it live →

A neon Snake game Qwythos built on a canvas

Built by Qwythos · 52.4 tok/s · 100% local A neon Snake game A full canvas game — arrow keys, food, score, collisions, game-over + restart. Written in one go. play it live →

A polished glowing calculator Qwythos built

Built by Qwythos · 52.5 tok/s · 100% local A working calculator Glowing keypad, full math, keyboard support, AC/DEL — the kind of thing people pay for. Free, offline. open it live →

A SaaS landing page Qwythos built for a fictional AI app

Built by Qwythos · 51.0 tok/s · 100% local A SaaS landing page Hero, nav, feature cards, glowing CTA, fade-in animations — a full marketing page from one sentence. open it live →

An animated particle galaxy Qwythos built on a canvas

Built by Qwythos · ~52 tok/s · 100% local A particle galaxy Generative canvas art — hundreds of glowing stars spiralling on their own. Pure maths, no assets. open it live →

Six apps. One model. Zero dollars. Nothing left my Mac. Each one is a single self-contained HTML file Qwythos wrote start to finish — click any of them above.

And it's not just me typing prompts at a terminal. This is the model wired straight into my Agent OS — the Local engine drove every one of these builds from the dashboard, reporting its own speed as it worked:

the raw test — run it yourself

$ ollama run richardyoung/qwythos-9b-abliterated
# prompt: "In two sentences, what is an AI Agent OS?"
An AI Agent OS refers to an autonomous operating system where the core
intelligence consists of capable AI agents rather than passive processes.
These agents perceive their environment and act on it without constant
human supervision, performing tasks end to end.
→ 51.9 tokens/sec · 0 dollars · nothing left the Mac

$ curl localhost:3737/api/local/model
{"model":"richardyoung/qwythos-9b-abliterated:latest","warm":true}
# ↑ the Agent OS Local engine is running Qwythos

$ curl localhost:3737/api/local/chat   # the dashboard, end to end
{"t":"d","c":"..."}  ...streaming...
{"t":"stats","tps":51,"tokens":95,"model":"richardyoung/qwythos-9b-abliterated:latest"}

Measured on an Apple M4 Max via Ollama (Q4_K_M build). I ran these with a second model also loaded, so the first load was slow — but generation held ~44–53 tok/s the whole way. Your speed varies by machine. Every number here is reproducible: pull the model and run the same commands.

Head to head · the test you asked for

Qwythos vs Ornith-9B — same prompt, same Mac.

Ornith-9B is the other strong local model I run. So I gave both the exact same job — "build a neon Snake game" — on the same machine, and measured everything. Both shipped a working game. The difference is speed and weight.

Qwythos-9B52.4 tok/s · 5.6 GB

Clean in-game render, auto-starts, tight grid. Built in ~72s end to end.

Ornith-9B24.4 tok/s · 9.5 GB

Also a working game with a start screen. Same task took ~189s — over 2× longer.

On the same Mac (M4 Max)	Qwythos-9B	Ornith-9B
Build speed (Snake, same prompt)	52.4 tok/s	24.4 tok/s
Short-answer speed (same prompt)	29.9 tok/s	21.5 tok/s
On-disk size	5.6 GB	9.5 GB
Built a working app	Yes ✓	Yes ✓
Context window	1M tokens	standard
Cost · privacy	$0 · 100% local	$0 · 100% local

Both are great free local models. But for the same work, Qwythos came back about twice as fast while taking up 40% less space on the drive — which is exactly why it's the one I pinned as the default in the Agent OS Local engine. The lesson is the same one this whole site runs on: the model is only half of it — the system you wire it into is what makes it useful.

Under the hood · why this one's special

Why Qwythos punches way above a 9B.

Most local models are just a base model, shrunk. Qwythos is different — it's a stack of four deliberate moves layered on top of an open Qwen3.5-9B base. That's why a model small enough to live on a laptop writes and reasons like something far bigger. Here's the exact recipe, in plain English.

01 · base

Qwen3.5-9BA strong, open 9-billion-parameter base model. The raw engine.

→

02 · style

Claude Mythos + Fable tracesPost-trained on Claude-style reasoning & creative traces. This is where it learns to "sound like Claude."

→

03 · memory

YaRN → 1M contextContext stretched to a 1M-token ceiling so it can hold huge inputs.

→

04 · unlock

Heretic abliterationResidual refusals trimmed — without retraining or hurting quality.

→

05 · ship

GGUF → OllamaQuantised with llama.cpp so it runs fast on your own Mac.

✍️It thinks like Claude — locally

It was trained on Claude Mythos & Fable reasoning traces, so its writing and problem-solving carry that style. You get a Claude-flavoured brain that runs on your machine for $0.

🧠A 1M-token memory ceiling

The architecture supports up to a million tokens via YaRN — whole codebases or books in one shot. (We run it at a 16k window in the Agent OS to keep RAM light; you can dial it up.)

🛠️It reasons and calls tools

It's a "thinking" model with a <think> step and native function-calling — so it can actually drive agents, not just chat. That's what makes it useful inside an OS.

🔓Abliterated — near losslessly

The "Heretic" pass surgically removes the model's reflex to refuse, so it stops bailing on legitimate work — and it does it with almost zero quality drift (see the KL number below).

The benchmarks — measured + published

Two kinds of numbers here: the speed/quality I clocked myself on an M4 Max, and the abliteration metrics published on the model card.

Metric	Value	What it means
Generation speed (builds, M4 Max)	~52 tok/s	faster than you can read — measured across 6 real builds
vs Ornith-9B (same Snake build)	~2× faster	52.4 vs 24.4 tok/s, at 40% less disk
Real apps built end-to-end	6 / 6 ✓	clock, to-do, Snake, calculator, landing, galaxy
Refusals after abliteration	53 / 100	on a harmful-prompt eval — far fewer needless "I can't help with that"
KL divergence (quality drift)	0.0066	≈ negligible — the unlock barely changed the model's brain
Runtime context (Agent OS)	16,384 tok	raised from the 8k default; model ceiling is 1M
Cost · privacy	$0 · 100% local	nothing leaves your machine, ever

About that "1M context" — read this before you get excited. The 1M is the model's ceiling, not what it runs at out of the box. Ollama loads it with a much smaller live window (8,192 tokens by default) because a true 1M window would need far more memory than most Macs have — the longer the window, the bigger the memory cost. So in the real world you pick a window that fits your RAM. I hit this exactly: a big build started getting cut off mid-code because it ran past the 8k default. The fix was one line — I bumped the Agent OS Local engine to a 16,384-token window (plenty for any single-file build, and it only added ~0.2 GB of RAM). Want to feed it a giant document? You can push the window higher — you'll just trade memory for it. Bottom line: "1M" is real, but treat it as headroom you rent with RAM, not a free default.

Which version should you download?

It ships in a ladder of sizes (quantisations). Smaller = lighter + faster but slightly less sharp; bigger = closer to the original. For most Macs, the recommended Q4_K_M (the default) is the sweet spot.

Tag	Size	Best for
IQ3_M	4.4 GB	smallest — older / tighter-RAM Macs
IQ4_XS	5.2 GB	great quality-to-size ratio
Q4_K_M (latest)	5.6 GB	recommended — what I run
Q5_K_M	6.5 GB	higher quality, a bit heavier
Q8_0	9.5 GB	near-lossless — if you've got the RAM

The honest pros & cons

No model is magic. Here's where Qwythos genuinely shines — and where you should keep your expectations in check.

👍 What's great

Free & 100% private — runs entirely on your Mac, nothing leaves the machine, no per-token bill ever.
Fast for its size — ~52 tok/s on my M4 Max, about 2× quicker than Ornith-9B on the same job.
Light footprint — 5.6 GB, runs on a normal modern Mac with no graphics card.
Claude-style brain — trained on Claude Mythos & Fable traces, so it writes and reasons with that flavour.
Agent-ready — a thinking model with native function-calling, not just a chatbot.
Fewer pointless refusals — abliterated with almost no quality drift (KL 0.0066).
It actually builds — 6 working single-file apps, start to finish, in this guide alone.

👎 Where to keep expectations real

It's a 9B, not a frontier model — for the hardest reasoning or huge codebases, Opus / GPT-class cloud models still win. Use the right tool for the job.
"1M context" costs RAM — the real window you run is limited by memory (we use 16k). True long-context isn't free.
First load is slow — ~25s cold while it reads 5.6 GB off disk; it also thrashes if you keep another big model warm at the same time.
Reduced guardrails — abliterated means the responsibility is on you (see below).
It's just the model — no built-in tools, memory, or web. On its own it's an engine; it needs a system around it to be genuinely useful.
Occasionally messy output — once in a while it writes a plan instead of clean code, or trails off. A good harness re-runs it.

Notice the biggest con: it's just the model. That's the whole point of this site — a great model is only half the equation. Drop Qwythos into a system that gives it tools, memory, and a place to ship, and a free 9B starts doing real work. That system is the Agent OS.

One honest caveat. This is an abliterated build — its safety guardrails are deliberately reduced, so it'll engage with a wider range of prompts than a stock model. That's a feature for serious local work, but it means the responsibility is on you. Use it within the law, and keep it for the legitimate building this guide is about. The base + abliteration are the work of empero-ai (Qwythos / Claude Mythos series), Heretic by p-e-w, and Richard Young / DeepNeuro.

What it is

A Claude-Mythos brain, without the cloud.

Qwythos-9B is a 9-billion-parameter model built on a Qwen3.5-9B base and post-trained on Claude Mythos & Fable traces. So it writes and reasons with that Claude-style voice — but it runs entirely on your own machine through Ollama.

Three things make it special for everyday work:

▲ It's a thinking model. It reasons step-by-step before it answers, and it can call functions — so it works as a real agent, not just a chatbot.
▲ It has a 1-million-token context. You can hand it a whole book, a long transcript, or a pile of notes and it holds the lot.
▲ It's abliterated — fewer refusals. It won't bail halfway through an edgy creative brief. (Reduced guardrails — use it responsibly.)

Thinking it?

"A 9B model can't be any good."

I thought the same. Then I ran it on my Mac: it answered at ~53 tokens a second — faster than you can read. I asked it to define an Agent OS and it gave a clean, correct answer, then wrote a sharp one-line take on local AI. For a model that costs nothing and never leaves your machine, that's the win.

Who's telling you this

Why I test every local model that drops.

I run an AI-first SEO agency and teach thousands of operators how to wire AI into real businesses. A free model that runs on your own machine — fast, private, no bill — is one of the biggest unlocks there is. So when one lands, I pull it, clock it on my own Mac, and report back plainly.

3,600+ founders inside AIPB
400k YouTube subscribers
38 countries · live members
163k X / Twitter followers

I'm not going to paste invented quotes here. The wins are real and written by the members themselves — agency owners, ecom founders, course creators, solo operators across 38 countries. Read them in their own words.

Read the 158-page wins doc →

Before you scroll on —

Commit to owning your AI today.

You've seen the numbers. Free, private, fast, on the Mac you already have.

Here's the deal I want you to make with yourself.

Before you sleep tonight, you're going to run one command and have a Claude-style model living on your own machine.

Just one command. No card, no signup, no cloud.

Because the moment you stop renting your brain and start owning it, everything about how you work with AI changes — your privacy, your bill, your freedom to work offline.

The people who set this up now will be running local agents while everyone else is still watching the meter tick.

Be one of those people.

Commit to running it tonight. One command. It's yours after that.

Run it yourself

One line. Then it's yours.

Install Ollama (free), then pull the model. That's the whole setup:

# 1 · get Ollama (free) from ollama.com, then:
ollama run richardyoung/qwythos-9b-abliterated
    

The recommended build is Q4_K_M (5.6 GB) — the sweet spot of quality and speed. If you're tight on memory there's a 4.4 GB build; if you want maximum quality there's a near-lossless 9.5 GB one. Pick your size:

Pick your build — smaller = lighter, bigger = sharper

download size per quant · longer bar = more disk + RAM · all run on a modern Mac

IQ3_M · smallest

4.4 GB

IQ4_XS

5.2 GB

Q4_K_M · pick this

5.6 GB

Q5_K_M

6.5 GB

Q8_0 · near-lossless

9.5 GB

Thinking it?

"I'll need some monster GPU rig for this."

You won't. The recommended build is 5.6 GB — it runs on a normal modern Mac, no separate graphics card, no cloud. I ran it on an M4 Max at ~53 tokens a second. The first load reads the 5.6 GB off disk so give it a moment; after that it stays warm and answers instantly. If you've got a recent MacBook, you can run this.

The shift

The old way vs the new way.

Here's the contrast between renting a cloud model and owning a local one.

The old way

cloud · metered

Every prompt pings a paid cloud API
Your notes + client data fly to someone else's server
Rate limits + outages stop your work cold
No wifi = no AI
The meter never stops ticking
Result: a bill that grows with every draft

The new way

local · free

Every prompt runs on your own chip — ~53 tok/s
Nothing leaves your Mac — fully private
No rate limits, no outages, no waitlist
Works on a plane, in a cafe, offline
1M-token context for whole books + transcripts
Result: $0 per token, forever

Make it the brain of your whole stack

Wire Qwythos into a system, not just a chat box.

Agent OS — Claude, OpenClaw and Hermes connected

A local model is powerful. A local model running as the engine of your whole Agent OS is unstoppable. Inside the AI Profit Boardroom you get the Agent OS that wires Qwythos in as the Local engine — your agents run on it by default, free and private, with Claude, OpenClaw, and Hermes layered on top for the heavy lifting.

The full Agent OS zip — the Local engine wiring, every prompt, the dashboard
Four live coaching calls a week with operators running local + cloud models
Daily tutorials — including how to wire a local model in as your default
Token-optimisation tutorials so your paid usage drops to almost nothing
A community of 3,600+ founders across 38 countries, online 24/7

Get the Agent OS →

Inside the AI Profit Boardroom · skool.com/ai-profit-lab

link in the description ↑

Doesn't running the Agent OS burn a fortune in tokens?

No — that's the biggest myth about it. The Agent OS runs the everyday 90% on a free local model like this Qwythos (on your own Mac, nothing leaving it), free APIs slot in for more, and for the frontier work it drives the CLIs you already pay for — your Claude subscription already includes the Claude CLI, and the Agent OS plugs straight into it, so you're not paying twice. It's a layer on top of what you already own, not a new meter. And inside the AI Profit Boardroom there are full token-optimisation tutorials, so you cut usage to the bone and never think about it again.

Three beliefs to drop

What's been stopping you running local.

Wrong: "Local models are slow and weak — toys compared to the cloud."

Right: Qwythos answered at ~53 tokens a second on my Mac — faster than you read — with a Claude-Mythos voice and a 1M-token memory. For everyday work, it holds its own, free.

Wrong: "I need an expensive GPU rig to run my own AI."

Right: The recommended build is 5.6 GB and runs on a normal modern Mac — no separate graphics card, no cloud. If you've got a recent MacBook, you're ready.

Wrong: "Free always means worse, so why bother."

Right: The model isn't the moat — the system around it is. A free local model wired into a real Agent OS beats a pricey cloud model used as a lonely chat box. Own the brain, build the system.

Don't take my word for it

158 pages of members who stopped renting tools and built systems instead — real businesses, real wins, in their own words.

Read the 158-page wins doc →

The recap

What you now know.

You own a Claude-style brain

Qwythos-9B — Claude-Mythos creative + reasoning, running on your own Mac.

ii.

You stopped paying per token

It's free, forever. No meter, no card, no cloud bill.

iii.

Your data stays yours

100% local — your notes and client docs never leave the machine.

iv.

It's fast + light

~53 tok/s, just 5.6 GB on disk. Runs on a normal modern Mac.

It holds a whole book

1M-token context — long transcripts, big docs, all at once.

vi.

It's wired into the Agent OS

Set as the Local engine, so your agents run on it free by default.

One system, free at the core

Build the system. Run it free.

Qwythos gives you a free, private brain. The Agent OS inside the AI Profit Boardroom turns that brain into a whole operating system — Claude, OpenClaw, and Hermes on one dashboard with shared memory, with this local model as the free engine underneath so most of your work never costs a cent.

The full Agent OS zip — Local-engine wiring, every prompt, the Obsidian memory setup
Weekly coaching calls where we set up local + cloud models together
Daily tutorials + token-optimisation training so it runs near-free
3,600+ founders across 38 countries, someone online 24/7
The 158-page wins doc — read what members actually built

Get the Agent OS →

Inside the AI Profit Boardroom · skool.com/ai-profit-lab

I'll see you in the next one ↓

A free Claude-style AI that runs on your own Mac.

Pull it yourself — it's one line.

I didn't just install it. I put it to work.

Qwythos vs Ornith-9B — same prompt, same Mac.

Why Qwythos punches way above a 9B.

The benchmarks — measured + published

Which version should you download?

The honest pros & cons

👍 What's great

👎 Where to keep expectations real

A Claude-Mythos brain, without the cloud.

I was tired of renting my own brain.

Why I test every local model that drops.

Commit to owning your AI today.

One line. Then it's yours.

The old way vs the new way.

Wire Qwythos into a system, not just a chat box.

What's been stopping you running local.

What you now know.

Build the system. Run it free.