The full Hermes + Agent OS setup lives inside the AI Profit BoardroomJoin
AI agents · how-to

How to Build an AI Agent Crew with Hermes

One agent is handy. A crew is a business. Here's how to wire up a team of Hermes agents that research, build, judge and ship — step by step, no fluff.

By Julian Goldie10 min readUpdated 14 Jun 2026

A single AI agent is useful. But the real unlock is a crew — a team of specialist agents that each do one job and hand off to the next, like staff that never sleep.

I run a 7-figure agency with agent crews doing the heavy lifting. This guide is exactly how I build them. No theory — just the setup that ships work daily.

What you'll end up with

A researcher, a writer, a builder/editor and a judge — running on one board, turning a one-line brief into a finished, checked deliverable. While you do something else.

The principle: specialists beat generalists

Here's the mistake most people make with AI: they ask one model to do everything. Research, write, edit, review — all in one conversation. It works, sort of. But the quality is mediocre because the model is context-switching constantly.

A crew fixes this. Each agent does ONE thing with its full context window dedicated to that job. The researcher only researches. The writer only writes. The judge only judges. Every output is sharper because the agent isn't trying to be five things at once.

One agent is a swiss army knife. A crew is a toolbox.

Step 1 — Pick the engine (your model)

Every agent runs on a model. This is your biggest cost lever, so choose wisely:

The beauty of a crew is you can mix models per role. Researcher on GLM-5.2 (cheap, reads a lot). Builder on Claude Sonnet (precise, writes code). Judge on GLM-5.2 (fast, scores work). You control the cost-quality tradeoff at every step.

Step 2 — Create one profile per role

Don't make one agent do everything. Give each a job and a name. This is the whole trick.

In Hermes, each agent is a profile — a named persona with its own system prompt, model config and tool access. You create them once, and they persist across sessions.

The core 4 roles

  • Researcher — gathers facts, reads sources, grounds the work
  • Writer/Builder — drafts the content or writes the code
  • Editor — tightens, formats, checks structure
  • Judge — scores quality, sends weak work back

Optional specialists

  • SEO agent — keyword research, meta tags, schema
  • Designer — CSS, layout, visual polish
  • Tester — runs code, catches bugs, writes tests
  • Publisher — deploys, schedules, posts

Start with the core 4. Add specialists as your jobs get bigger. I started with 3 agents. Now my main crew has 7. But the first 4 carry 90% of the value.

Step 3 — Put them on one board

This is what turns separate agents into a crew. Give them a shared kanban board.

In Hermes, each job is a card. The card moves through stages — todo → ready → running → done. Each agent picks up cards assigned to it, does the work, and the card moves to the next agent.

Here's how a typical job flows:

  1. You create a card: "Write an SEO article about Hermes for beginners"
  2. It goes to todo → the dispatcher promotes it to ready
  3. The researcher claims it, gathers facts, writes findings, marks done
  4. A new card spawns for the writer with the researcher's output attached
  5. The writer drafts the article, marks done
  6. The editor tightens, formats, checks structure
  7. The judge scores it. Below 8/10? Back to the writer with notes. Above 8? Done.
  8. You get a finished, checked article on the board

You watch this happen in real time. Cards move, agents work, output appears. It feels like running a real team — because you are.

Want the whole crew, already wired?

Hermes runs best inside the Agent Operating System — the dashboard, the shared memory, the kanban, every agent profile, and weekly coaching calls. 2,200+ founders are building with it right now.

Get the Agent OS →
Inside the AI Profit Boardroom · aiprofitboardroom.com

Step 4 — Add the judge loop (this is the secret)

Most people skip this. It's the single biggest quality lever in the whole system.

The judge is an agent that never produces output. Its only job is to score other agents' work. It reads the deliverable, checks it against your criteria, and gives a score out of 10. Below the threshold (I use 8), it sends the work back with specific notes on what to fix.

Why this matters

Without a judge, the writer ships its first draft. With a judge, the writer ships its third draft — because the first two got sent back. The quality difference is night and day.

Your judge criteria should be specific. For articles: "hook in the first 2 lines, 1500+ words, FAQ section with schema, AIPB CTA, no AI-speak." For code: "all tests pass, no console errors, clean diff, follows project conventions." The clearer the criteria, the sharper the feedback.

A second pair of eyes that never gets tired.

Step 5 — Give the brief, walk away

Now the crew is wired. You just hand it a topic. Here's what the actual workflow looks like:

  1. You write one line: "Build me 3 SEO articles about Hermes, upselling the AI Profit Boardroom, with hooks, FAQ, schema and CTAs."
  2. The crew breaks it into cards: one research card, three write cards, three edit cards, three judge cards
  3. Agents run in sequence: research → write → edit → judge → (loop if needed) → done
  4. You come back to a full board: 3 finished, checked articles ready to ship

Total time: 10-15 minutes. Your involvement: writing one sentence and reviewing the final output. Everything in between happened autonomously.

Advanced: parallel work and delegation

Once you're comfortable with the basic flow, you can level up:

Common mistakes (and how to avoid them)

Don't do this

  • One agent does everything (quality suffers)
  • Skip the judge (first drafts ship, quality drops)
  • Vague briefs ("write something good")
  • All agents on the most expensive model
  • No shared memory (agents start cold every time)
  • Too many agents, too early (start with 4)

Do this instead

  • One role per agent, clear boundaries
  • Judge every output, score threshold at 8
  • Specific briefs with format and criteria
  • Mix models: cheap for volume, premium for hard
  • Persistent memory so agents know your business
  • Start small, add specialists as you grow

What a finished crew looks like

Here's my actual content crew — the one that produced this article:

Five agents. All on a free model. Total cost per article: effectively zero. Total time: under 15 minutes. Quality: checked by a judge before I ever see it.

That's not a demo. That's a system.

Do this once and your output stops depending on your hours. That's the difference between using AI and running it.

FAQ

How many agents should a crew have?

Start with four — researcher, writer, builder and judge. Add more specialists (SEO, editor) as your jobs get bigger. Each agent should own one clear job.

What model should each agent use?

A strong, low-cost model like GLM-5.2 on the coding plan lets you run a whole crew for almost nothing. You can mix models per role too.

How does the judge agent work?

The judge scores each result out of ten against your criteria. Anything under eight goes back with notes, so only good work reaches you.

Do I need to code to build a crew?

No — you create named agent profiles and give plain-English briefs. The fastest path is a ready-made setup inside Agent OS.

How much does it cost to run a crew?

On GLM-5.2 (free on the coding plan), a 5-agent crew costs almost nothing per run. If you use premium models for specific roles, costs scale with usage but you control which agent uses what.

Can agents work in parallel?

Yes. Hermes supports parallel delegation — multiple agents can work on independent tasks simultaneously, then hand results back to the orchestrator.

JG
Julian Goldie
Runs a 7-figure SEO agency and the AI Profit Boardroom — 2,200+ founders, $100k+/mo, 319k YouTube subscribers. Builds AI agent systems daily.