Free AI models · June 2026

Free Models in HermesRun real AI agents for $0 — four ways.

Everyone thinks running AI agents means a scary API bill. It doesn't. Hermes plugs into four free lanes — Nous Portal, OpenRouter's free APIs, local models on your own machine, and OAuth reuse of plans you already pay for. Same agent, same tools, $0 per token. Here's every free model, exactly how to set each lane up, and the real things I've built on them.

A winged messenger figure fully clothed in a long flowing golden chiton with four glowing streams of light converging into it from four directions

The Agent OS — a sidebar full of free and OAuth model agents: Hermes, Local, Free Claude Code, GLM, Kimi, Grok and more

every one of these agents can run on a free model — Hermes is the hub they all plug into

four free lanes feed one agent — pick whichever fits the job

My story · why this matters

I used to ration my agents. The bill scared me.

Before

Every time I let an agent loop, I watched the token meter tick up.

I'd cancel runs early just to keep the bill down.

Big jobs — research, a batch of builds, an overnight loop — felt too expensive to even try.

My best ideas stayed parked because I was scared of the API cost.

Then I wired the free lanes into Hermes.

After

Now I run agents all day and the per-token cost is zero.

Local models build on my own machine, offline, for nothing.

Free APIs handle the cheap parallel work.

And the plans I already pay for cover the heavy lifts — no extra bill.

You can run the exact same stack for $0. Here's every lane.

the receipts

Real operators. Real builds. Running on free models.

This is the stack the founders inside the AI Profit Boardroom run every day — building, shipping and automating without a scary API bill.

3,600+Founders inside AIPB
400kYouTube subscribers
163kX / Twitter followers
38Countries · live members

Real member · building with free AI

See the full 158 pages of member wins →

✦

I ────── the problem

Every token an agent burns is money. So people hold back.

An agent isn't one prompt — it's a loop that reads, reasons, calls tools, and tries again, sometimes for hundreds of steps. On a metered API, every one of those steps costs you. So most people ration: short runs, no overnight loops, no big batches. The free lanes flip that — the agent can run as long as the job needs.

Metered API — the old way

you ration

Every agent step costs per token
You cancel loops early to save money
Big batches feel too expensive to try
Overnight runs are a scary bill in the morning
Your private data goes to someone's API
One model, locked to one vendor

Free lanes — the new way

you let it run

$0 per token on free + local models
Loop all day, batch freely, run overnight
Local models run offline, on your machine
Your data never leaves your laptop (local lane)
Swap models per job — no lock-in
Reuse the plans you already pay for, no extra bill

II ────── the framework

Four free lanes into one agent.

Hermes is the hub. It doesn't care where the model lives — a cloud portal, a free API, your own GPU, or a subscription you already hold. You point it down whichever lane fits the job. Here's each one, in plain steps.

Nous Portal

free · cloud

Nous Research runs a portal that fronts a big catalogue of models — and the ones tagged :free run without spending a credit. The headline free pair right now: Step 3.7 Flash and Nemotron 3 Ultra. And they add new free models all the time, so it's worth checking the list.

stepfun/step-3.7-flash:freeNemotron 3 Ultra (free)+ new ones added often

terminal · log in + pick a free model

# log in to Nous Portal (opens your browser once)
hermes portal
# then pick a model — choose a ':free' one
hermes model
# → select  stepfun/step-3.7-flash:free  (or Nemotron 3 Ultra)
hermes

Honest note: after login the portal lists 300+ models, but only the :free-tagged ones run without credits — in practice that's Step 3.7 Flash and the Nemotron free tier today. They can get busy under load, so they're best for steady single-agent work, not a hammering batch. Good for: a free always-on chat/agent brain in the cloud with nothing to install.

OpenRouter — type "free"

free · cloud APIs

OpenRouter is a single API in front of hundreds of models — and a rotating set of them are completely free. The trick is dead simple: open the model search and type free. You'll get the current free list, and it changes often as new ones land.

🔍 free

Models · free

Cohere: North Mini Code free

Nous: Hermes 3 405B Instruct free

NVIDIA: Nemotron 3 Ultra free

NVIDIA: Nemotron 3 Nano Omni free

Venice: Uncensored free

Poolside: Laguna XS 2 free

Nex AGI: N2 Pro free

Two I lean on: N2 (nex-agi/nex-n2-pro:free, a free agentic MoE) and North Mini Code (cohere/north-mini-code:free, a genuinely free agentic coder with full tool-calling). Wire one into Hermes as a profile:

hermes · add a free OpenRouter model

# get a free key at openrouter.ai/keys, then either pick it interactively…
hermes model   # → OpenRouter → cohere/north-mini-code:free

# …or make a dedicated profile (config.yaml):
model:
  default: cohere/north-mini-code:free   # ✅ plain slug
  provider: openrouter
# key goes in the profile's OWN .env as OPENROUTER_API_KEY

Two gotchas that cost me real debugging: (1) use the plain slug (cohere/north-mini-code:free) with provider: openrouter set separately — prefixing it openrouter/… silently fails. (2) Free reasoning models like N2 will think away your whole budget and stream nothing — pass reasoning: {enabled:false} for clean code output, and run sequentially (the free tier throttles on rapid parallel calls). Good for: free cloud APIs for cheap parallel agent work and single-file builds.

Local — your own machine

free · offline · private

The most free of all: run the model on your own computer. No API, no internet needed, nothing leaves your laptop. Two easy ways in — Ollama (a one-line menubar app) or LM Studio (a friendly desktop app). Both download open-weight models you then point Hermes at.

gemma-4-12B-codercohere north-mini-codeOrnith-9BQwen 3.6 (big rig)

Great free local picks for agentic work: a Gemma 4 coder and North Mini Code run well on a normal laptop; Ornith-9B is a tools-capable agentic coder; and if you've got a beefy setup, Qwen 3.6 is superb — but it depends on your hardware.

terminal · run a model locally, then hand it to Hermes

# install Ollama (ollama.com) or LM Studio, then pull a model:
ollama pull gemma-4-12b-coder
# point a Hermes 'local' profile at it (config.yaml):
model:
  default: gemma-4-12b-coder
  provider: ollama
  ollama_num_ctx: 65536    # Hermes needs 64k+ context
hermes --profile local

Honest hardware notes: Hermes needs 64k+ context, so set ollama_num_ctx: 65536. Keep models under ~15GB on a 32–36GB machine, and never pin a 20–32GB model (it'll swap your whole Mac). Good for: $0, fully offline, fully private agent work — and it runs the Agent OS's whole local stack (the Kanban crew, the loop judge, the Local Engine).

OAuth — reuse what you already pay for

no extra bill

This one's free if you already have one of these plans. If you've got a MiniMax coding plan, a Kimi K2.7 coding plan, or an X Premium+ / SuperGrok subscription, you can log into Hermes with OAuth — no API key, no per-token bill on top. You're just pointing Hermes at a subscription you're already paying for. (z.ai's GLM Coding Plan works the same way.)

Kimi K2.7 (coding plan)GLM-5.2 (z.ai plan)Grok (X Premium+)MiniMax

hermes · log in with a plan you already hold

# Grok via your X Premium+ / SuperGrok (browser OAuth, no key):
hermes auth add xai-oauth
# Kimi K2.7 coding plan: log in, then add the coding credential
kimi login
# GLM-5.2 z.ai Coding Plan: add the coding-plan key to the profile
hermes --profile glm-5-2 auth add zai --type api-key --api-key <key>

Honest framing: these aren't free models — but if you already pay for the plan, you pay nothing extra to drive it through Hermes instead of one per-token API. It turns a subscription you've got into an agent brain. Good for: the heavy lifts — frontier-grade coding and reasoning — on money you're already spending.

III ────── built on free models

None of this is theory. Here's what they built.

Every demo below was generated by a model on one of the free lanes — no frontier API, no big bill. Local models on my own machine, and a coding-plan model over OAuth. They're live; give them a second to load.

local · Ornith-9Bopen ↗

3D particle globe

local · offline · $0

local · Ornith-9Bopen ↗

Constellation canvas

local · offline · $0

OAuth · GLM-5.2open ↗

Spiral galaxy

coding plan · no extra bill

OAuth · GLM-5.2open ↗

Particle forge

coding plan · no extra bill

Plus the playable stuff — an Ornith neon Breakout (local) and a GLM neon racer (OAuth). The honest truth: most ran first try, a couple needed a small fix from me — but every one came from a free or already-paid lane, not a metered frontier API.

per token across the local + free lanes — I let agents loop, batch and build all day, and the bill for the model never moves.

IV ────── which lane when

Pick the lane that fits the job.

You don't choose one forever — you switch per task. A rough map:

☁️ Nous Portal

Nothing to install, always-on cloud brain. Great for a steady free chat/agent. Best when you don't want to manage keys or hardware.

🔌 OpenRouter free

Free cloud APIs you swap by typing "free". Great for cheap parallel jobs + single-file builds. Watch the throttle on rapid batches.

💻 Local

$0, offline, private. Great for anything sensitive, overnight loops, or unlimited batches. Needs a decent machine + 64k context.

🔑 OAuth

Frontier-grade quality on a plan you already hold. Great for the heavy coding + reasoning lifts. No extra per-token bill.

The move isn't "find the one free model." It's keep all four lanes open and send each job down the cheapest one that can do it well.

All four, wired in

Every free lane is built into the Agent OS.

You don't set up four tools by hand. Inside the Agent Operating System in the AI Profit Boardroom, Hermes and all four lanes are pre-wired — Nous Portal, OpenRouter free APIs, local models, and OAuth — with the profiles, keys and gotchas already handled. You pick a lane and build.

Hermes + all four free lanes — pre-configured, ready to run
The local stack — Ollama models driving agents 100% offline
The gotchas solved — plain-slug profiles, 64k context, reasoning-off
4 coaching calls a week + daily tutorials as new free models land
3,600+ members across 38 countries building for $0

Get the Agent OS →

Inside the AI Profit Boardroom · skool.com/ai-profit-lab

link in the description ↑

✦

V ────── the recap

What to take away.

Agents don't need a big bill. Hermes plugs into four free lanes — pick whichever fits the job.

ii.

Cloud free: Nous Portal (Step 3.7 Flash, Nemotron) + OpenRouter — just type "free" for the latest free APIs.

iii.

Local free: Ollama or LM Studio runs Gemma 4, North Mini, Ornith on your own machine — offline + private.

iv.

OAuth: reuse a Kimi, GLM, MiniMax or X plan you already pay for — frontier quality, no extra per-token cost.

Stop rationing your agents. Run them on free — all four lanes.

Last thing

Let your agents run. For nothing.

The people who win with AI next year won't be the ones who paid the most — they'll be the ones who let their agents work around the clock without watching a meter. The Agent Operating System inside the AI Profit Boardroom hands you Hermes with every free lane wired in, so you can build, loop and ship for $0 starting today.

The full Agent OS — Hermes + all four free lanes, pre-built
The setup walkthrough, done with you, step by step
4 weekly coaching calls, daily tutorials, a 30-day roadmap
3,600+ members, a member map, a 24/7 community
158 pages of member wins — read them here →

Get the Agent OS →

Inside the AI Profit Boardroom · skool.com/ai-profit-lab

Four lanes. Zero per-token cost. See you in the next one ↗

Free Models in Hermes · June 2026 · Nous Portal (Step 3.7 Flash, Nemotron 3 Ultra) · OpenRouter free APIs (N2, North Mini + more — type "free") · local via Ollama / LM Studio · OAuth reuse of Kimi / GLM / MiniMax / Grok plans · every demo above was built on a free or already-paid lane · used in 38 countries