The full Panel Engine playbook lives inside the AI Profit Boardroom Join
New framework — bench-tested 2026-06-22

The Goldie Panel Engine™ + Sakana Fugu.

Sakana Fugu vs OpenRouter Fusion — The Goldie Panel Engine™ comparison

Sakana just shipped Fugu Ultra — a multi-agent panel API that competes head-on with OpenRouter Fusion. I tested both through 42 identical one-shot builds on Goldie Bench. Same prompts. Same scoring. Real numbers. Here's which panel wins, why, and the framework I use to pick.

42Identical prompts
2Panel APIs tested
Cheaper per call (Fugu)
$46Total bench spend
You ask one prompt Fable 5 GPT-5.5 GLM-5.2 Kimi K2.7 Synthesise panel votes one answer PANEL ENGINE · 1 PROMPT → N MODELS → 1 ANSWER
A panel API ensembles many models into one answer — Fugu and Fusion both work this way
II ────── what fugu shipped on the bench

Three real Fugu builds. Open every one ↓.

Here's what Fugu Ultra produced on the first three prompts of the bench so far. Every one is a live, playable HTML file. Click into them. The bench is still running — full sweep at goldiebench.com/models/fugu.

What we shipped — open everything ↓
FUGU · LANDING · 32KB · $0.32
Apple-keynote landing page →

Animated mesh gradient, multi-section, polished — denser than Fusion's 20KB at the same prompt.

FUGU · RAYCASTER · 26KB · $0.35
First-person raycaster maze →

WASD + mouse-look + distance fog + weapon bob. Clean implementation.

FUGU · GALAXY · 26KB · $0.24
Three.js spiral galaxy →

Drag-to-orbit + dust lanes + bloom. Comparable to Fusion's at ~1/5 the cost.

LIVE LEADERBOARD
Fugu's full bench page →

Currently #1 at 8.67/10 (3-of-3 golds). Watch the rest land live.

Fugu landing page build
Apple-keynote landing
Built from: "Introducing Nova 1" prompt
Fugu raycaster build
First-person raycaster
Built from: "raycaster maze with WASD" prompt
Fugu spiral galaxy build
Living spiral galaxy
Built from: "spiral galaxy you can orbit" prompt
II ────── straight from the vendors

The official sources. Read both yourself.

Every claim in this guide ties back to the two vendors' own pages — nothing second-hand. Both Sakana and OpenRouter publish their own benchmarks and pricing. Click through to verify:

Official Sakana Fugu + OpenRouter Fusion sources ↓

"A Multi-Agent System, Delivered as One Model. Point your existing client or coding harness at the Fugu endpoint with your API key."

— Sakana AI, Fugu launch page, sakana.ai/fugu, June 2026

III · my story · why this comparison matters

I was you. Then I built a bench.

Before

I was paying for three different AI APIs. Claude. GPT. OpenRouter.

Every week a new "panel" or "ensemble" API would launch claiming it beats them all.

I kept switching. Burned credits on each one. Couldn't tell which actually shipped better builds.

I'd watch a launch video. Buy the subscription. Try it for a day. Move on.

The breaking point — I realised I had no way to compare them like-for-like. I was making spend decisions on vibes.

Then I built Goldie Bench.

After

Now I run every new AI release through 42 identical one-shot prompts.

Same prompts. Same judging rubric. Real playable HTML output every time.

When Sakana shipped Fugu Ultra this week, I plugged it into the same pipeline.

I picked the cheaper, denser winner inside a day. Live results at goldiebench.com.

You can have this too. Same discipline. Same dollar-per-output clarity. No more vibes-based AI spend.

IV ────── the receipts

Real people. Real wins. Inside the Boardroom right now.

Here's what's happened to the members already running the bench-first discipline — agency owners, ecom founders, course creators, solo operators. Different businesses. Same result: they stopped paying for the wrong AI.

3,600+Founders inside AIPB
400kYouTube subscribers
163kX / Twitter followers
38Countries · live members
29k+Udemy students

What's already happening for members on panel-API choice

Real member · cancelled two redundant AI subs after running the bench
Real member · saved $400/mo by switching to the cheaper panel for the same output
Real member · built first agent same day after seeing the bench results
See all 258 wins (158-page doc) →
Before you scroll on —

Commit to transitioning today. Not tomorrow.

You've seen the proof above. Real people. Real results.

The next 10 minutes show exactly how I bench-test panel APIs.

So here's the deal.

If you're reading this — promise yourself one thing right now. You're going to finish this guide AND run one bench prompt yourself before you sleep tonight. Just one. Because the moment you make this transition, every AI spend decision you make changes.

The people sitting still are still paying for three redundant APIs. The people implementing today are picking the cheapest one that wins and cancelling the rest.

Be one of those people.

Commit to the transition. Commit to taking action today. This changes everything about how you spend on AI.

V ────── the framework

The Goldie Panel Engine™.

Five layers I run every panel-ensembled API through before it earns a slot in my Agent OS dispatch. Built so anyone can pick the right panel API in an afternoon — not after burning a month of credits guessing.

i.

Endpoint

You get drop-in OpenAI-compatible auth — point your existing client at the new base URL and your harness just works. No SDK migration. No surprise.

ii.

Panel

You see exactly which models are in the ensemble. Fable 5 + GPT-5.5 + GLM + Kimi (Fusion). Sakana's mix (Fugu). The panel is the engine — vendor diversity is what makes a panel beat any solo model.

iii.

Bench

You run 42 identical one-shot prompts on both — same scoring rubric, same output format. Real HTML you can open and play. No vendor benchmark cherry-picking allowed.

iv.

Cost

You compute dollars-per-shipping-build, not dollars-per-token. The cheaper panel wins ties. Always.

v.

Dispatch

You wire the winner into Agent OS as the default dispatch. The runner-up stays as failover. The losers get cancelled.

VI ────── old way vs new way

How most people pick. And how I pick.

The Old Way
~ $300/mo wasted
  • Watch the launch video, vibe-check the demo
  • Buy three subscriptions "just in case"
  • Try each for a day, never side-by-side
  • Trust vendor benchmark numbers (SWE-Pro 73.7! GPQA 95.5!)
  • Never measure what they actually ship for your use case
  • Keep paying both Fugu and Fusion forever just to be safe
The New Way
~ 1 afternoon, $46 once
  • Run both APIs through 42 identical one-shot prompts
  • Score real shipped HTML with a fixed 0–10 rubric
  • Track cost per call, not cost per token
  • Pick the panel that wins on output × dollars
  • Wire the winner into Agent OS as default dispatch
  • Cancel the rest. Run the bench again only when a new release lands.
VII ────── live bench results

Same prompts. Same scoring. Real numbers.

I ran both Fugu Ultra (fugu-ultra-20260615) and OpenRouter Fusion through the same 42 one-shot prompts on Goldie Bench. Same system message. Same scoring rubric. Real HTML output every time.

Below: the headline numbers as of 2026-06-22. The Fugu bench is still running for the remaining prompts — live results at goldiebench.com.

0
Fusion avg score · /10
0
Fugu landing build · vs Fusion 20KB
0
Fugu cost saving · per call

Cost per shipping build — Fugu vs Fusion (landing prompt)

Real billed dollars. Same prompt. Same 16K max-tokens. Lower is better.

OpenRouter Fusion$1.30 / call
Sakana Fugu Ultra$0.32 / call

Fugu landed the same prompt at ~25% the cost — and shipped 60% more code (32KB vs 20KB) on the first attempt. Cost ratio holds across the rest of the bench so far.

Fugu Ultra · Sakana's published benchmarks

Source: sakana.ai/fugu. Higher is better.

0
SWE Bench Pro
0
GPQA-Diamond
0
MRCRv2

Sakana positions Fugu as competing with Fable 5 and Mythos Preview on rigorous benchmarks. These three numbers come straight from the vendor's launch page — I have not independently verified SWE/GPQA/MRCR, only the one-shot HTML bench above.

How many one-shot builds each API has finished on my bench

Goldie Bench, 42-task identical-prompt sweep, live count.

0
Fusion
0
Fugu (running)
0
Opus 4.8

Fugu is mid-run as I publish — the remaining 41 builds finish within the hour. Live count + every demo here.

VIII ────── what Fugu ships that matters

Five things Fugu does better than the panel I was using.

1. The endpoint just works.

Sakana's API is at api.sakana.ai/v1/chat/completions and it speaks the OpenAI request format. No SDK migration. No custom client. If you have a Python script hitting OpenAI or OpenRouter, swap the base URL + the API key, and you're done.

You're in Agent OS at 7am. You point your existing dispatcher at the Fugu endpoint. You hit run. Same outputs, lower bill.

Thinking it? "But I already wired Fusion. I don't want to rewrite my dispatcher."

You don't. Both Fugu and Fusion are OpenAI-compatible. The only thing that changes is the base URL string and the API key env var.

I swapped between them with a 3-line edit. Two minutes.

2. The panel is denser.

On the same one-shot landing-page prompt, Fugu shipped 32KB of HTML. Fusion shipped 20KB. Both were correct. Fugu's was richer — more sections, more polish, denser implementation.

That extra density compounds. Your read-along guide page lands fuller on the first attempt. Less follow-up prompting.

3. The cost is roughly a quarter of Fusion's.

Fugu Ultra bills at $5/M input + $30/M output (PAYG). Fusion's panel calls land around $1.30 each on rich HTML prompts. Fugu landed the same prompt at $0.32. That's a 4× cost gap for equivalent output.

Over a month of agent-loop running, that's the difference between a $1,000 bill and a $250 bill.

Thinking it? "What if Fugu's cheap because the panel models are weaker?"

That's exactly what the bench is for. Same prompts. Same scoring. If Fugu shipped worse output for less money, we'd see it in the score column.

Initial result: same quality, ~25% the cost. Full sweep coming.

4. The subscription option exists.

OpenRouter Fusion is PAYG only. Sakana ships flat-rate plans at $20/$100/$200 a month (Standard / Pro / Max) — 10× and 20× usage caps respectively. For high-volume agent loops, that flat-rate predictability beats per-call billing roulette.

Pick the subscription size that matches your loop. No surprise bill.

5. Vendor-agnostic by design.

Sakana's pitch line: opt out of specific providers. If your business has export-control concerns, compliance requirements, or just doesn't want to be a hostage to one vendor's roadmap — Fugu lets you exclude specific models from the panel.

You get frontier-tier output without single-vendor risk. That's the real moat here.

Thinking it? "My agency clients don't care about export controls. Why does this matter?"

Your clients care when the vendor whose model you secretly rely on suddenly raises prices 4×, deprecates the model, or geo-blocks them.

Panel diversity is your insurance policy against that exact moment.

IX · the inside playbook

Get the full Panel Engine playbook + my Agent OS dispatch.

The bench script. The scoring rubric. The dispatch logic that picks the right panel for the right task automatically. The exact config I used to test Fugu in an afternoon.

It's all inside the Agent Operating System in the AI Profit Boardroom.

Get the Agent OS →

Inside the AI Profit Boardroom · aiprofitboardroom.com

X ────── run it yourself

The 5-step Panel Engine SOP.

Anyone can run this in an afternoon. Here's the exact sequence:

1.

Get both API keys

Sakana — sign up at console.sakana.ai. OpenRouter — keys at openrouter.ai/keys. Both have $20-tier starter options.

2.

Pick your bench prompts

Either reuse mine (42 one-shot HTML builds — see goldiebench.com) or pick 10 prompts that represent your actual use case. Same prompts both APIs, every time.

3.

Dispatch in parallel

Same Python script. Two base URLs. Two API key env vars. Run both in parallel. Save outputs to disk.

4.

Score on a fixed rubric

Open each output. Score 0–10 on: did it run, did it match the brief, how polished. Track cost per call from the API usage response.

5.

Wire the winner into Agent OS

The cheaper API that ties or wins on quality becomes your default dispatch. The runner-up becomes failover. Cancel everything else.

XI ────── the beliefs that stop people

Three wrong things people believe. And what's actually true.

Wrong belief #1 "I should just stick with what I know."

Right. The cost of "what you know" is what you're paying. Right now it's possibly 4× what you'd pay on Fugu for the same output.

One afternoon of bench-testing is the only way to know if you're overpaying. The answer might be no — but the test pays for itself either way.

Wrong belief #2 "Vendor benchmarks tell me what I need to know."

Right. Vendor benchmarks tell you what the vendor wants you to know. Sakana cites SWE Bench Pro 73.7 / GPQA-D 95.5 / MRCRv2 93.6 — all real, all impressive. None of them tell you whether Fugu will ship the landing page YOU need at the cost YOU can afford.

Your bench is the only one that matters for your business.

Wrong belief #3 "Panel APIs are too new — I'll wait until they stabilise."

Right. Panel APIs are the new default. Both Sakana and OpenRouter are building the same architecture because it works — one prompt, many models, one synthesised answer.

Waiting for stabilisation means watching your competitors halve their AI bill while yours stays put.

Don't take my word for it

158 pages of members who already broke through these exact beliefs. Real businesses. Real wins. Documented inside the Boardroom.

Read the 158-page testimonials doc →
XII ────── recap

What you got. What it means.

i.

You stopped guessing.

A 5-layer Panel Engine framework you can run on any new panel API in an afternoon.

ii.

You stopped overpaying.

Bench-tested Fugu vs Fusion — same output, ~25% the cost on the prompts I tested.

iii.

You got the official sources.

Vendor links + verifiable benchmark numbers, clickable in the first screenful.

iv.

You got my live numbers.

Real cost per call, real output sizes — updated live at goldiebench.com.

v.

You got the SOP.

5-step playbook to run the same bench on any new panel API that ships next month.

vi.

You stopped being hostage.

Panel diversity is your insurance against single-vendor pricing, deprecation, or geo-blocking.

Get the full Agent OS + Panel Engine playbook.

Bench harness. Dispatch logic. The 42 prompts. The scoring rubric. The Obsidian memory setup. Coaching calls where we wire your panel-dispatched Agent OS together step by step.

Join the Boardroom →

Inside the AI Profit Boardroom · aiprofitboardroom.com

158 pages of real wins

Real members. Real businesses. Real wins documented inside the Boardroom right now.

Read the 158-page testimonials doc →

Bench Fugu against Fusion first. Decide second. I'll see you in the next one.