A single AI agent is useful. But the real unlock is a crew — a team of specialist agents that each do one job and hand off to the next, like staff that never sleep.
I run a 7-figure agency with agent crews doing the heavy lifting. This guide is exactly how I build them. No theory — just the setup that ships work daily.
A researcher, a writer, a builder/editor and a judge — running on one board, turning a one-line brief into a finished, checked deliverable. While you do something else.
The principle: specialists beat generalists
Here's the mistake most people make with AI: they ask one model to do everything. Research, write, edit, review — all in one conversation. It works, sort of. But the quality is mediocre because the model is context-switching constantly.
A crew fixes this. Each agent does ONE thing with its full context window dedicated to that job. The researcher only researches. The writer only writes. The judge only judges. Every output is sharper because the agent isn't trying to be five things at once.
One agent is a swiss army knife. A crew is a toolbox.
Step 1 — Pick the engine (your model)
Every agent runs on a model. This is your biggest cost lever, so choose wisely:
- GLM-5.2 — free on the coding plan, frontier-level quality. My default for 80% of crew work
- Claude Sonnet — premium, excellent for coding and reasoning. Use for the builder role on hard jobs
- GPT-4o — strong all-rounder. Good alternative if you're already on OpenAI
The beauty of a crew is you can mix models per role. Researcher on GLM-5.2 (cheap, reads a lot). Builder on Claude Sonnet (precise, writes code). Judge on GLM-5.2 (fast, scores work). You control the cost-quality tradeoff at every step.
Step 2 — Create one profile per role
Don't make one agent do everything. Give each a job and a name. This is the whole trick.
In Hermes, each agent is a profile — a named persona with its own system prompt, model config and tool access. You create them once, and they persist across sessions.
The core 4 roles
- Researcher — gathers facts, reads sources, grounds the work
- Writer/Builder — drafts the content or writes the code
- Editor — tightens, formats, checks structure
- Judge — scores quality, sends weak work back
Optional specialists
- SEO agent — keyword research, meta tags, schema
- Designer — CSS, layout, visual polish
- Tester — runs code, catches bugs, writes tests
- Publisher — deploys, schedules, posts
Start with the core 4. Add specialists as your jobs get bigger. I started with 3 agents. Now my main crew has 7. But the first 4 carry 90% of the value.
Step 3 — Put them on one board
This is what turns separate agents into a crew. Give them a shared kanban board.
In Hermes, each job is a card. The card moves through stages — todo → ready → running → done. Each agent picks up cards assigned to it, does the work, and the card moves to the next agent.
Here's how a typical job flows:
- You create a card: "Write an SEO article about Hermes for beginners"
- It goes to todo → the dispatcher promotes it to ready
- The researcher claims it, gathers facts, writes findings, marks done
- A new card spawns for the writer with the researcher's output attached
- The writer drafts the article, marks done
- The editor tightens, formats, checks structure
- The judge scores it. Below 8/10? Back to the writer with notes. Above 8? Done.
- You get a finished, checked article on the board
You watch this happen in real time. Cards move, agents work, output appears. It feels like running a real team — because you are.
Want the whole crew, already wired?
Hermes runs best inside the Agent Operating System — the dashboard, the shared memory, the kanban, every agent profile, and weekly coaching calls. 2,200+ founders are building with it right now.
Get the Agent OS →Step 4 — Add the judge loop (this is the secret)
Most people skip this. It's the single biggest quality lever in the whole system.
The judge is an agent that never produces output. Its only job is to score other agents' work. It reads the deliverable, checks it against your criteria, and gives a score out of 10. Below the threshold (I use 8), it sends the work back with specific notes on what to fix.
Without a judge, the writer ships its first draft. With a judge, the writer ships its third draft — because the first two got sent back. The quality difference is night and day.
Your judge criteria should be specific. For articles: "hook in the first 2 lines, 1500+ words, FAQ section with schema, AIPB CTA, no AI-speak." For code: "all tests pass, no console errors, clean diff, follows project conventions." The clearer the criteria, the sharper the feedback.
A second pair of eyes that never gets tired.
Step 5 — Give the brief, walk away
Now the crew is wired. You just hand it a topic. Here's what the actual workflow looks like:
- You write one line: "Build me 3 SEO articles about Hermes, upselling the AI Profit Boardroom, with hooks, FAQ, schema and CTAs."
- The crew breaks it into cards: one research card, three write cards, three edit cards, three judge cards
- Agents run in sequence: research → write → edit → judge → (loop if needed) → done
- You come back to a full board: 3 finished, checked articles ready to ship
Total time: 10-15 minutes. Your involvement: writing one sentence and reviewing the final output. Everything in between happened autonomously.
Advanced: parallel work and delegation
Once you're comfortable with the basic flow, you can level up:
- Parallel delegation — spawn 3 writer agents simultaneously, each on a different article. Total time drops from 15 min to 5 min
- Nested crews — an orchestrator agent manages sub-crews for complex jobs (e.g., a research crew, a writing crew, a QA crew)
- Cron jobs — schedule recurring tasks: "every Monday at 9am, research this week's trending topics and draft article outlines"
- Cross-crew memory — agents share a persistent memory store, so the writer remembers your voice from last session without a refresher
- Model routing per task — route easy research to GLM-5.2 (free), hard coding to Claude (premium). Optimize cost automatically
Common mistakes (and how to avoid them)
Don't do this
- One agent does everything (quality suffers)
- Skip the judge (first drafts ship, quality drops)
- Vague briefs ("write something good")
- All agents on the most expensive model
- No shared memory (agents start cold every time)
- Too many agents, too early (start with 4)
Do this instead
- One role per agent, clear boundaries
- Judge every output, score threshold at 8
- Specific briefs with format and criteria
- Mix models: cheap for volume, premium for hard
- Persistent memory so agents know your business
- Start small, add specialists as you grow
What a finished crew looks like
Here's my actual content crew — the one that produced this article:
- glm-researcher — GLM-5.2, web access, reads sources and writes research briefs
- glm-writer — GLM-5.2, writes in my voice, structures articles
- glm-seo — GLM-5.2, adds meta tags, schema, FAQ sections, internal links
- glm-designer — GLM-5.2, CSS and HTML layout, visual polish
- glm-judge — GLM-5.2, scores against criteria, loops until 8+/10
Five agents. All on a free model. Total cost per article: effectively zero. Total time: under 15 minutes. Quality: checked by a judge before I ever see it.
That's not a demo. That's a system.
Do this once and your output stops depending on your hours. That's the difference between using AI and running it.
FAQ
How many agents should a crew have?
Start with four — researcher, writer, builder and judge. Add more specialists (SEO, editor) as your jobs get bigger. Each agent should own one clear job.
What model should each agent use?
A strong, low-cost model like GLM-5.2 on the coding plan lets you run a whole crew for almost nothing. You can mix models per role too.
How does the judge agent work?
The judge scores each result out of ten against your criteria. Anything under eight goes back with notes, so only good work reaches you.
Do I need to code to build a crew?
No — you create named agent profiles and give plain-English briefs. The fastest path is a ready-made setup inside Agent OS.
How much does it cost to run a crew?
On GLM-5.2 (free on the coding plan), a 5-agent crew costs almost nothing per run. If you use premium models for specific roles, costs scale with usage but you control which agent uses what.
Can agents work in parallel?
Yes. Hermes supports parallel delegation — multiple agents can work on independent tasks simultaneously, then hand results back to the orchestrator.