How to Build an AI Agent Crew with Hermes (Step by Step)

A single AI agent is useful. But the real unlock is a crew — a team of specialist agents that each do one job and hand off to the next, like staff that never sleep.

I run a 7-figure agency with agent crews doing the heavy lifting. This guide is exactly how I build them. No theory — just the setup that ships work daily.

What you'll end up with

A researcher, a writer, a builder/editor and a judge — running on one board, turning a one-line brief into a finished, checked deliverable. While you do something else.

The principle: specialists beat generalists

Here's the mistake most people make with AI: they ask one model to do everything. Research, write, edit, review — all in one conversation. It works, sort of. But the quality is mediocre because the model is context-switching constantly.

A crew fixes this. Each agent does ONE thing with its full context window dedicated to that job. The researcher only researches. The writer only writes. The judge only judges. Every output is sharper because the agent isn't trying to be five things at once.

One agent is a swiss army knife. A crew is a toolbox.

Step 1 — Pick the engine (your model)

Every agent runs on a model. This is your biggest cost lever, so choose wisely:

GLM-5.2 — free on the coding plan, frontier-level quality. My default for 80% of crew work
Claude Sonnet — premium, excellent for coding and reasoning. Use for the builder role on hard jobs
GPT-4o — strong all-rounder. Good alternative if you're already on OpenAI

The beauty of a crew is you can mix models per role. Researcher on GLM-5.2 (cheap, reads a lot). Builder on Claude Sonnet (precise, writes code). Judge on GLM-5.2 (fast, scores work). You control the cost-quality tradeoff at every step.

Step 2 — Create one profile per role

Don't make one agent do everything. Give each a job and a name. This is the whole trick.

In Hermes, each agent is a profile — a named persona with its own system prompt, model config and tool access. You create them once, and they persist across sessions.

The core 4 roles

Researcher — gathers facts, reads sources, grounds the work
Writer/Builder — drafts the content or writes the code
Editor — tightens, formats, checks structure
Judge — scores quality, sends weak work back

Optional specialists

SEO agent — keyword research, meta tags, schema
Designer — CSS, layout, visual polish
Tester — runs code, catches bugs, writes tests
Publisher — deploys, schedules, posts

Start with the core 4. Add specialists as your jobs get bigger. I started with 3 agents. Now my main crew has 7. But the first 4 carry 90% of the value.

Step 3 — Put them on one board

This is what turns separate agents into a crew. Give them a shared kanban board.

In Hermes, each job is a card. The card moves through stages — todo → ready → running → done. Each agent picks up cards assigned to it, does the work, and the card moves to the next agent.

Here's how a typical job flows:

You create a card: "Write an SEO article about Hermes for beginners"
It goes to todo → the dispatcher promotes it to ready
The researcher claims it, gathers facts, writes findings, marks done
A new card spawns for the writer with the researcher's output attached
The writer drafts the article, marks done
The editor tightens, formats, checks structure
The judge scores it. Below 8/10? Back to the writer with notes. Above 8? Done.
You get a finished, checked article on the board

You watch this happen in real time. Cards move, agents work, output appears. It feels like running a real team — because you are.

Want the whole crew, already wired?

Hermes runs best inside the Agent Operating System — the dashboard, the shared memory, the kanban, every agent profile, and weekly coaching calls. 2,200+ founders are building with it right now.

Get the Agent OS →

Inside the AI Profit Boardroom · aiprofitboardroom.com

Step 4 — Add the judge loop (this is the secret)

Most people skip this. It's the single biggest quality lever in the whole system.

The judge is an agent that never produces output. Its only job is to score other agents' work. It reads the deliverable, checks it against your criteria, and gives a score out of 10. Below the threshold (I use 8), it sends the work back with specific notes on what to fix.

Why this matters

Without a judge, the writer ships its first draft. With a judge, the writer ships its third draft — because the first two got sent back. The quality difference is night and day.

Your judge criteria should be specific. For articles: "hook in the first 2 lines, 1500+ words, FAQ section with schema, AIPB CTA, no AI-speak." For code: "all tests pass, no console errors, clean diff, follows project conventions." The clearer the criteria, the sharper the feedback.

A second pair of eyes that never gets tired.

Step 5 — Give the brief, walk away

Now the crew is wired. You just hand it a topic. Here's what the actual workflow looks like:

You write one line: "Build me 3 SEO articles about Hermes, upselling the AI Profit Boardroom, with hooks, FAQ, schema and CTAs."
The crew breaks it into cards: one research card, three write cards, three edit cards, three judge cards
Agents run in sequence: research → write → edit → judge → (loop if needed) → done
You come back to a full board: 3 finished, checked articles ready to ship

Total time: 10-15 minutes. Your involvement: writing one sentence and reviewing the final output. Everything in between happened autonomously.

Advanced: parallel work and delegation

Once you're comfortable with the basic flow, you can level up:

Parallel delegation — spawn 3 writer agents simultaneously, each on a different article. Total time drops from 15 min to 5 min
Nested crews — an orchestrator agent manages sub-crews for complex jobs (e.g., a research crew, a writing crew, a QA crew)
Cron jobs — schedule recurring tasks: "every Monday at 9am, research this week's trending topics and draft article outlines"
Cross-crew memory — agents share a persistent memory store, so the writer remembers your voice from last session without a refresher
Model routing per task — route easy research to GLM-5.2 (free), hard coding to Claude (premium). Optimize cost automatically

Common mistakes (and how to avoid them)

Don't do this

One agent does everything (quality suffers)
Skip the judge (first drafts ship, quality drops)
Vague briefs ("write something good")
All agents on the most expensive model
No shared memory (agents start cold every time)
Too many agents, too early (start with 4)

Do this instead

One role per agent, clear boundaries
Judge every output, score threshold at 8
Specific briefs with format and criteria
Mix models: cheap for volume, premium for hard
Persistent memory so agents know your business
Start small, add specialists as you grow

What a finished crew looks like

Here's my actual content crew — the one that produced this article:

glm-researcher — GLM-5.2, web access, reads sources and writes research briefs
glm-writer — GLM-5.2, writes in my voice, structures articles
glm-seo — GLM-5.2, adds meta tags, schema, FAQ sections, internal links
glm-designer — GLM-5.2, CSS and HTML layout, visual polish
glm-judge — GLM-5.2, scores against criteria, loops until 8+/10

Five agents. All on a free model. Total cost per article: effectively zero. Total time: under 15 minutes. Quality: checked by a judge before I ever see it.

That's not a demo. That's a system.

Do this once and your output stops depending on your hours. That's the difference between using AI and running it.

FAQ

How many agents should a crew have?

Start with four — researcher, writer, builder and judge. Add more specialists (SEO, editor) as your jobs get bigger. Each agent should own one clear job.

What model should each agent use?

A strong, low-cost model like GLM-5.2 on the coding plan lets you run a whole crew for almost nothing. You can mix models per role too.

How does the judge agent work?

The judge scores each result out of ten against your criteria. Anything under eight goes back with notes, so only good work reaches you.

Do I need to code to build a crew?

No — you create named agent profiles and give plain-English briefs. The fastest path is a ready-made setup inside Agent OS.

How much does it cost to run a crew?

On GLM-5.2 (free on the coding plan), a 5-agent crew costs almost nothing per run. If you use premium models for specific roles, costs scale with usage but you control which agent uses what.

Can agents work in parallel?

Yes. Hermes supports parallel delegation — multiple agents can work on independent tasks simultaneously, then hand results back to the orchestrator.

Julian Goldie

Runs a 7-figure SEO agency and the AI Profit Boardroom — 2,200+ founders, $100k+/mo, 319k YouTube subscribers. Builds AI agent systems daily.