Everyone thinks running AI agents means a scary API bill. It doesn't. Hermes plugs into four free lanes — Nous Portal, OpenRouter's free APIs, local models on your own machine, and OAuth reuse of plans you already pay for. Same agent, same tools, $0 per token. Here's every free model, exactly how to set each lane up, and the real things I've built on them.

Before
Every time I let an agent loop, I watched the token meter tick up.
I'd cancel runs early just to keep the bill down.
Big jobs — research, a batch of builds, an overnight loop — felt too expensive to even try.
My best ideas stayed parked because I was scared of the API cost.
Then I wired the free lanes into Hermes.
After
Now I run agents all day and the per-token cost is zero.
Local models build on my own machine, offline, for nothing.
Free APIs handle the cheap parallel work.
And the plans I already pay for cover the heavy lifts — no extra bill.
You can run the exact same stack for $0. Here's every lane.
This is the stack the founders inside the AI Profit Boardroom run every day — building, shipping and automating without a scary API bill.
An agent isn't one prompt — it's a loop that reads, reasons, calls tools, and tries again, sometimes for hundreds of steps. On a metered API, every one of those steps costs you. So most people ration: short runs, no overnight loops, no big batches. The free lanes flip that — the agent can run as long as the job needs.
Hermes is the hub. It doesn't care where the model lives — a cloud portal, a free API, your own GPU, or a subscription you already hold. You point it down whichever lane fits the job. Here's each one, in plain steps.
Nous Research runs a portal that fronts a big catalogue of models — and the ones tagged :free run without spending a credit. The headline free pair right now: Step 3.7 Flash and Nemotron 3 Ultra. And they add new free models all the time, so it's worth checking the list.
# log in to Nous Portal (opens your browser once) hermes portal # then pick a model — choose a ':free' one hermes model # → select stepfun/step-3.7-flash:free (or Nemotron 3 Ultra) hermes
Honest note: after login the portal lists 300+ models, but only the :free-tagged ones run without credits — in practice that's Step 3.7 Flash and the Nemotron free tier today. They can get busy under load, so they're best for steady single-agent work, not a hammering batch. Good for: a free always-on chat/agent brain in the cloud with nothing to install.
OpenRouter is a single API in front of hundreds of models — and a rotating set of them are completely free. The trick is dead simple: open the model search and type free. You'll get the current free list, and it changes often as new ones land.
Two I lean on: N2 (nex-agi/nex-n2-pro:free, a free agentic MoE) and North Mini Code (cohere/north-mini-code:free, a genuinely free agentic coder with full tool-calling). Wire one into Hermes as a profile:
# get a free key at openrouter.ai/keys, then either pick it interactively… hermes model # → OpenRouter → cohere/north-mini-code:free # …or make a dedicated profile (config.yaml): model: default: cohere/north-mini-code:free # ✅ plain slug provider: openrouter # key goes in the profile's OWN .env as OPENROUTER_API_KEY
Two gotchas that cost me real debugging: (1) use the plain slug (cohere/north-mini-code:free) with provider: openrouter set separately — prefixing it openrouter/… silently fails. (2) Free reasoning models like N2 will think away your whole budget and stream nothing — pass reasoning: {enabled:false} for clean code output, and run sequentially (the free tier throttles on rapid parallel calls). Good for: free cloud APIs for cheap parallel agent work and single-file builds.
The most free of all: run the model on your own computer. No API, no internet needed, nothing leaves your laptop. Two easy ways in — Ollama (a one-line menubar app) or LM Studio (a friendly desktop app). Both download open-weight models you then point Hermes at.
Great free local picks for agentic work: a Gemma 4 coder and North Mini Code run well on a normal laptop; Ornith-9B is a tools-capable agentic coder; and if you've got a beefy setup, Qwen 3.6 is superb — but it depends on your hardware.
# install Ollama (ollama.com) or LM Studio, then pull a model: ollama pull gemma-4-12b-coder # point a Hermes 'local' profile at it (config.yaml): model: default: gemma-4-12b-coder provider: ollama ollama_num_ctx: 65536 # Hermes needs 64k+ context hermes --profile local
Honest hardware notes: Hermes needs 64k+ context, so set ollama_num_ctx: 65536. Keep models under ~15GB on a 32–36GB machine, and never pin a 20–32GB model (it'll swap your whole Mac). Good for: $0, fully offline, fully private agent work — and it runs the Agent OS's whole local stack (the Kanban crew, the loop judge, the Local Engine).
This one's free if you already have one of these plans. If you've got a MiniMax coding plan, a Kimi K2.7 coding plan, or an X Premium+ / SuperGrok subscription, you can log into Hermes with OAuth — no API key, no per-token bill on top. You're just pointing Hermes at a subscription you're already paying for. (z.ai's GLM Coding Plan works the same way.)
# Grok via your X Premium+ / SuperGrok (browser OAuth, no key): hermes auth add xai-oauth # Kimi K2.7 coding plan: log in, then add the coding credential kimi login # GLM-5.2 z.ai Coding Plan: add the coding-plan key to the profile hermes --profile glm-5-2 auth add zai --type api-key --api-key <key>
Honest framing: these aren't free models — but if you already pay for the plan, you pay nothing extra to drive it through Hermes instead of one per-token API. It turns a subscription you've got into an agent brain. Good for: the heavy lifts — frontier-grade coding and reasoning — on money you're already spending.
Every demo below was generated by a model on one of the free lanes — no frontier API, no big bill. Local models on my own machine, and a coding-plan model over OAuth. They're live; give them a second to load.
Plus the playable stuff — an Ornith neon Breakout (local) and a GLM neon racer (OAuth). The honest truth: most ran first try, a couple needed a small fix from me — but every one came from a free or already-paid lane, not a metered frontier API.
You don't choose one forever — you switch per task. A rough map:
You don't set up four tools by hand. Inside the Agent Operating System in the AI Profit Boardroom, Hermes and all four lanes are pre-wired — Nous Portal, OpenRouter free APIs, local models, and OAuth — with the profiles, keys and gotchas already handled. You pick a lane and build.
Stop rationing your agents. Run them on free — all four lanes.
The people who win with AI next year won't be the ones who paid the most — they'll be the ones who let their agents work around the clock without watching a meter. The Agent Operating System inside the AI Profit Boardroom hands you Hermes with every free lane wired in, so you can build, loop and ship for $0 starting today.