Sakana Fugu vs OpenRouter Fusion — The Goldie Panel Engine™ comparison

II ────── straight from the vendors

The official sources. Read both yourself.

Every claim in this guide ties back to the two vendors' own pages — nothing second-hand. Both Sakana and OpenRouter publish their own benchmarks and pricing. Click through to verify:

Official Sakana Fugu + OpenRouter Fusion sources ↓

Sakana · launch pageFugu — A Multi-Agent System, Delivered as One Model → Sakana · console + keysconsole.sakana.ai (login + API keys) → Sakana · API endpointapi.sakana.ai/v1 — OpenAI-compatible → OpenRouter · Fusion pageopenrouter/fusion (model listing) → OpenRouter · docsOpenRouter API documentation → my bench · livegoldiebench.com — full results live →

"A Multi-Agent System, Delivered as One Model. Point your existing client or coding harness at the Fugu endpoint with your API key."

— Sakana AI, Fugu launch page, sakana.ai/fugu, June 2026

III · my story · why this comparison matters

I was you. Then I built a bench.

Before

I was paying for three different AI APIs. Claude. GPT. OpenRouter.

Every week a new "panel" or "ensemble" API would launch claiming it beats them all.

I kept switching. Burned credits on each one. Couldn't tell which actually shipped better builds.

I'd watch a launch video. Buy the subscription. Try it for a day. Move on.

The breaking point — I realised I had no way to compare them like-for-like. I was making spend decisions on vibes.

Then I built Goldie Bench.

After

Now I run every new AI release through 42 identical one-shot prompts.

Same prompts. Same judging rubric. Real playable HTML output every time.

When Sakana shipped Fugu Ultra this week, I plugged it into the same pipeline.

I picked the cheaper, denser winner inside a day. Live results at goldiebench.com.

You can have this too. Same discipline. Same dollar-per-output clarity. No more vibes-based AI spend.

IV ────── the receipts

Real people. Real wins. Inside the Boardroom right now.

Here's what's happened to the members already running the bench-first discipline — agency owners, ecom founders, course creators, solo operators. Different businesses. Same result: they stopped paying for the wrong AI.

3,600+Founders inside AIPB
400kYouTube subscribers
163kX / Twitter followers
38Countries · live members
29k+Udemy students

What's already happening for members on panel-API choice

Real member · cancelled two redundant AI subs after running the bench

Real member · saved $400/mo by switching to the cheaper panel for the same output

Real member · built first agent same day after seeing the bench results

See all 258 wins (158-page doc) →

Before you scroll on —

Commit to transitioning today. Not tomorrow.

You've seen the proof above. Real people. Real results.

The next 10 minutes show exactly how I bench-test panel APIs.

So here's the deal.

If you're reading this — promise yourself one thing right now. You're going to finish this guide AND run one bench prompt yourself before you sleep tonight. Just one. Because the moment you make this transition, every AI spend decision you make changes.

The people sitting still are still paying for three redundant APIs. The people implementing today are picking the cheapest one that wins and cancelling the rest.

Be one of those people.

Commit to the transition. Commit to taking action today. This changes everything about how you spend on AI.

V ────── the framework

The Goldie Panel Engine™.

Five layers I run every panel-ensembled API through before it earns a slot in my Agent OS dispatch. Built so anyone can pick the right panel API in an afternoon — not after burning a month of credits guessing.

Endpoint

You get drop-in OpenAI-compatible auth — point your existing client at the new base URL and your harness just works. No SDK migration. No surprise.

ii.

Panel

You see exactly which models are in the ensemble. Fable 5 + GPT-5.5 + GLM + Kimi (Fusion). Sakana's mix (Fugu). The panel is the engine — vendor diversity is what makes a panel beat any solo model.

iii.

Bench

You run 42 identical one-shot prompts on both — same scoring rubric, same output format. Real HTML you can open and play. No vendor benchmark cherry-picking allowed.

iv.

Cost

You compute dollars-per-shipping-build, not dollars-per-token. The cheaper panel wins ties. Always.

Dispatch

You wire the winner into Agent OS as the default dispatch. The runner-up stays as failover. The losers get cancelled.

VI ────── old way vs new way

How most people pick. And how I pick.

The Old Way

~ $300/mo wasted

Watch the launch video, vibe-check the demo
Buy three subscriptions "just in case"
Try each for a day, never side-by-side
Trust vendor benchmark numbers (SWE-Pro 73.7! GPQA 95.5!)
Never measure what they actually ship for your use case
Keep paying both Fugu and Fusion forever just to be safe

The New Way

~ 1 afternoon, $46 once

Run both APIs through 42 identical one-shot prompts
Score real shipped HTML with a fixed 0–10 rubric
Track cost per call, not cost per token
Pick the panel that wins on output × dollars
Wire the winner into Agent OS as default dispatch
Cancel the rest. Run the bench again only when a new release lands.

VIII ────── what Fugu ships that matters

Five things Fugu does better than the panel I was using.

1. The endpoint just works.

Sakana's API is at api.sakana.ai/v1/chat/completions and it speaks the OpenAI request format. No SDK migration. No custom client. If you have a Python script hitting OpenAI or OpenRouter, swap the base URL + the API key, and you're done.

You're in Agent OS at 7am. You point your existing dispatcher at the Fugu endpoint. You hit run. Same outputs, lower bill.

Thinking it? "But I already wired Fusion. I don't want to rewrite my dispatcher."

You don't. Both Fugu and Fusion are OpenAI-compatible. The only thing that changes is the base URL string and the API key env var.

I swapped between them with a 3-line edit. Two minutes.

2. The panel is denser.

On the same one-shot landing-page prompt, Fugu shipped 32KB of HTML. Fusion shipped 20KB. Both were correct. Fugu's was richer — more sections, more polish, denser implementation.

That extra density compounds. Your read-along guide page lands fuller on the first attempt. Less follow-up prompting.

3. The cost is roughly a quarter of Fusion's.

Fugu Ultra bills at $5/M input + $30/M output (PAYG). Fusion's panel calls land around $1.30 each on rich HTML prompts. Fugu landed the same prompt at $0.32. That's a 4× cost gap for equivalent output.

Over a month of agent-loop running, that's the difference between a $1,000 bill and a $250 bill.

Thinking it? "What if Fugu's cheap because the panel models are weaker?"

That's exactly what the bench is for. Same prompts. Same scoring. If Fugu shipped worse output for less money, we'd see it in the score column.

Initial result: same quality, ~25% the cost. Full sweep coming.

4. The subscription option exists.

OpenRouter Fusion is PAYG only. Sakana ships flat-rate plans at $20/$100/$200 a month (Standard / Pro / Max) — 10× and 20× usage caps respectively. For high-volume agent loops, that flat-rate predictability beats per-call billing roulette.

Pick the subscription size that matches your loop. No surprise bill.

5. Vendor-agnostic by design.

Sakana's pitch line: opt out of specific providers. If your business has export-control concerns, compliance requirements, or just doesn't want to be a hostage to one vendor's roadmap — Fugu lets you exclude specific models from the panel.

You get frontier-tier output without single-vendor risk. That's the real moat here.

Thinking it? "My agency clients don't care about export controls. Why does this matter?"

Your clients care when the vendor whose model you secretly rely on suddenly raises prices 4×, deprecates the model, or geo-blocks them.

Panel diversity is your insurance policy against that exact moment.

IX · the inside playbook

Get the full Panel Engine playbook + my Agent OS dispatch.

The bench script. The scoring rubric. The dispatch logic that picks the right panel for the right task automatically. The exact config I used to test Fugu in an afternoon.

It's all inside the Agent Operating System in the AI Profit Boardroom.

The bench harness — same 42 prompts I ran on Fugu and Fusion, reusable for any new panel API you want to test
Dispatch logic — Agent OS picks the panel based on task type + budget automatically
Daily new tutorials as new panel APIs ship
4 coaching calls per week with builders running this stack in production right now
30-day Agent OS roadmap — wire your first panel-dispatched agent inside a week
3,600+ founders in the community building the same way
A member map to connect with operators running panel APIs near you

Get the Agent OS →

Inside the AI Profit Boardroom · aiprofitboardroom.com

X ────── run it yourself

The 5-step Panel Engine SOP.

Anyone can run this in an afternoon. Here's the exact sequence:

Get both API keys

Sakana — sign up at console.sakana.ai. OpenRouter — keys at openrouter.ai/keys. Both have $20-tier starter options.

Pick your bench prompts

Either reuse mine (42 one-shot HTML builds — see goldiebench.com) or pick 10 prompts that represent your actual use case. Same prompts both APIs, every time.

Dispatch in parallel

Same Python script. Two base URLs. Two API key env vars. Run both in parallel. Save outputs to disk.

Score on a fixed rubric

Open each output. Score 0–10 on: did it run, did it match the brief, how polished. Track cost per call from the API usage response.

Wire the winner into Agent OS

The cheaper API that ties or wins on quality becomes your default dispatch. The runner-up becomes failover. Cancel everything else.

XI ────── the beliefs that stop people

Three wrong things people believe. And what's actually true.

Wrong belief #1 "I should just stick with what I know."

Right. The cost of "what you know" is what you're paying. Right now it's possibly 4× what you'd pay on Fugu for the same output.

One afternoon of bench-testing is the only way to know if you're overpaying. The answer might be no — but the test pays for itself either way.

Wrong belief #2 "Vendor benchmarks tell me what I need to know."

Right. Vendor benchmarks tell you what the vendor wants you to know. Sakana cites SWE Bench Pro 73.7 / GPQA-D 95.5 / MRCRv2 93.6 — all real, all impressive. None of them tell you whether Fugu will ship the landing page YOU need at the cost YOU can afford.

Your bench is the only one that matters for your business.

Wrong belief #3 "Panel APIs are too new — I'll wait until they stabilise."

Right. Panel APIs are the new default. Both Sakana and OpenRouter are building the same architecture because it works — one prompt, many models, one synthesised answer.

Waiting for stabilisation means watching your competitors halve their AI bill while yours stays put.

Don't take my word for it

158 pages of members who already broke through these exact beliefs. Real businesses. Real wins. Documented inside the Boardroom.

Read the 158-page testimonials doc →

XII ────── recap

What you got. What it means.

You stopped guessing.

A 5-layer Panel Engine framework you can run on any new panel API in an afternoon.

ii.

You stopped overpaying.

Bench-tested Fugu vs Fusion — same output, ~25% the cost on the prompts I tested.

iii.

You got the official sources.

Vendor links + verifiable benchmark numbers, clickable in the first screenful.

iv.

You got my live numbers.

Real cost per call, real output sizes — updated live at goldiebench.com.

You got the SOP.

5-step playbook to run the same bench on any new panel API that ships next month.

vi.

You stopped being hostage.

Panel diversity is your insurance against single-vendor pricing, deprecation, or geo-blocking.

Get the full Agent OS + Panel Engine playbook.

Bench harness. Dispatch logic. The 42 prompts. The scoring rubric. The Obsidian memory setup. Coaching calls where we wire your panel-dispatched Agent OS together step by step.

Agent Operating System — full zip file, every prompt, the Obsidian memory setup
Bench harness — the same 42-task script I ran on Fugu + Fusion
Daily tutorials on every new panel API the moment it ships
30-day Panel Engine roadmap — bench, pick, dispatch inside a week
4 weekly coaching calls · 3,600+ founders · 38 countries
Member map to connect with operators near you
24/7 community — someone's always online to debug

Join the Boardroom →