The /goal Command: A Complete Guide to Autonomous Agents on Codex, Hermes & Claude Code
Summary
In the span of six weeks in early 2026, three major AI coding platforms — OpenAI Codex, Nous Research Hermes, and Anthropic Claude Code — each shipped a /goal command. This wasn't coincidence. It was the industry converging on a shared interface for the next generation of autonomous agents. This guide covers what /goal is, how each platform implements it differently, and how to hand off work across platforms effectively.
Summary: In the span of six weeks in early 2026, three major AI coding platforms — OpenAI Codex, Nous Research Hermes, and Anthropic Claude Code — each shipped a
/goalcommand. This wasn't coincidence. It was the industry converging on a shared interface for the next generation of autonomous agents. This guide covers what/goalis, how each platform implements it differently, and how to hand off work across platforms effectively.
1. What Is /goal?
1.1 From Prompt-Response to Goal-Driven Execution
Traditional AI coding assistants work in single turns: you give an instruction, it responds, you send the next instruction. You act as the supervisor — approving every step, manually continuing the loop.
/goal ends that pattern. It introduces the concept of a Persistent Goal: you define a completion condition, and the agent autonomously works across multiple turns until that condition is met — without you intervening at every step.
Core idea: Turn "keep going" into a contract. You give the agent an outcome, a definition of done, and a way to verify progress. The agent keeps working until it reaches that outcome, runs out of budget, pauses, or hits a genuine blocker it can't resolve alone.
1.2 The Architecture: Evaluator Loop
The key technical innovation behind /goal is the separation of executor and judge:
- Worker model: Does the actual coding, testing, refactoring
- Evaluator model (Judge): After each turn, a separate lightweight model asks: "Has the goal been met?"
- ✅ Yes → return control to the user
- ❌ No → automatically start the next turn
This separation matters because, as one industry observer put it: "The model doing the work is the worst judge of whether it's done." Splitting the two roles is what makes a reliable autonomous loop possible.
User sets /goal
│
▼
┌─────────────┐
│ Worker │ ← writes code, runs tests, fixes bugs
│ Model │
└──────┬──────┘
│ turn complete
▼
┌─────────────┐
│ Evaluator │ ← checks completion condition
│ (Judge) │
└──────┬──────┘
│
┌───┴───┐
Yes No
│ │
▼ ▼
Done Next turn
(return (continue
to user) automatically)
1.3 Best-Fit Task Types
- ✅ Multi-step engineering tasks: module migrations, test suite repair, full-directory refactors
- ✅ Tasks with measurable end states: "all tests pass", "build exit code is 0", "file count reaches N"
- ✅ Long-running background tasks: database optimization, performance tuning, documentation generation
- ❌ Highly subjective tasks: design decisions, tasks requiring aesthetic judgment or human taste
- ❌ High-risk irreversible operations: these warrant human confirmation checkpoints
2. Platform Implementations
2.1 OpenAI Codex — The Execution Engine

OpenAI Codex CLI 。Source: @hqmank
Released: April 2026 (experimental), promoted to stable
Core positioning: Codex is an implementation-focused coding agent — give it a clear spec, it builds. /goal is the mechanism for delivering that spec in a durable, persistent way.
How It Works
Under the hood, Codex maintains a thread_goals database table that tracks each goal's status, token budget, and elapsed time. Goals have a formal lifecycle: active → paused / budget_limited → complete.
A deliberate design asymmetry: the model can start and declare complete a goal, but pause / resume / budget management is controlled by the user or system runtime. The tool spec explicitly states: "Create a goal only when explicitly requested by the user; do not infer goals from ordinary tasks."
Usage
# Launch interactive Codex session
codex
# Set a goal
/goal Optimize database queries in db.ts
Constraints:
- Keep schema unchanged
- Cover all execution paths with tests
- Target execution time below 50ms
# Subcommands
/goal pause # pause the active goal
/goal resume # resume a paused goal
/goal clear # clear the current goal
Codex-Specific Strengths
- Cross-session persistence: goal state stored in a database; closing the terminal doesn't lose progress
- Multi-environment support: per-turn environment switching (dev / staging / remote)
- AWS Bedrock integration: native SigV4 signing for AWS-native teams
- Remote Computer Use: continues working even when your Mac screen locks; pairs with Codex Mobile for remote monitoring
- External agent import: migrate sessions from other agent harnesses into Codex
tab: source — OpenAI Codex Changelog: https://developers.openai.com/codex/changelog | Kingy AI analysis: https://kingy.ai/ai/openai-codex-goal-the-new-long-horizon-mode-for-agentic-coding/ | GitHub implementation gist: https://gist.github.com/patleeman/b1b5768393f9bf2f60865b1defeeb819
2.2 Nous Research Hermes — The Multi-Agent Orchestrator

Hermes Agent orchestrating tasks across a Kanban board. Source: The End of the “Human Heartbeat”: How the /goal Command is Redefining AI Agents
Released: v0.13.0 — May 7, 2026 (Tenacity Release)
Core positioning: Hermes is not a coding worker — it's a multi-agent orchestrator. It doesn't write code itself; it coordinates Codex, Claude Code, and other tools to write code, managing every handoff in between.
How It Works: The Ralph Loop
Hermes calls its /goal implementation its take on the "Ralph Loop" — a stateful autonomous loop with a Judge Model, a configurable turn budget (default: 20 turns), and cross-session persistence:
- User sends a goal via any platform (Telegram, Discord, Slack, CLI…)
- Hermes creates task cards on an internal Kanban board
- It selects the right tool for each card (Codex to build, Claude Code to review…)
- A Judge Model checks after each turn whether the goal is complete
- If not done, auto-continues; if done, sends a summary report to the user
Usage
# Works from Telegram, Discord, Slack, Matrix, Signal, CLI — all the same
/goal Fix all failing tests in this repo
Requirement: run test commands, identify failures,
patch changes one at a time, until all tests pass
# Subcommands
/goal status # view active goal state
/goal pause # pause execution
/goal resume # resume
/goal clear # clear the goal
Hermes-Specific Strengths
- Cross-platform messaging interface: Telegram, Discord, Slack, Matrix, Signal — no terminal needed
- Built-in Judge Model: independent evaluator with a configurable turn budget
- Kanban board integration: goals auto-decomposed into tasks; supports multi-agent parallel execution
- Skills system: installed skills exposed as dynamic slash commands, including
/plan(open planning mode) - Permission tiers: admin vs. regular user command access control per platform group
- Codex CLI runtime integration (v0.13.0 beta): Hermes can hand shell execution and file patches directly to the Codex CLI, keeping its own memory, sessions DB, and
/goalintact
tab: source — Hermes Slash Commands Reference: https://hermes-agent.nousresearch.com/docs/reference/slash-commands | Geeky Gadgets guide: https://www.geeky-gadgets.com/automate-tasks-hermes-ai/ | AlphaSignal analysis: https://alphasignalai.substack.com/p/hermes-just-made-codex-the-engine
2.3 Anthropic Claude Code — Verification-Driven Agent

Claude Code /goal live status overlay showing elapsed time, turns, and tokens. Source: joe.njenga
Released: May 12, 2026 (Claude Code v2.1.139)
Core positioning: Claude Code excels at finding what's wrong with code that looks right — spec compliance violations, security holes, error states, edge cases. /goal lets you point it at a codebase and ask it to keep working until it's genuinely clean.
How It Works
Claude Code's /goal uses the Hooks system to implement the evaluator loop:
- After each turn, a lightweight, fast evaluator model checks whether the completion condition is satisfied
- A live overlay panel shows elapsed time, turn count, and token consumption in real time
- Available in three modes: interactive, headless (****
-pflag), and Remote Control
Usage
# Requires Claude Code v2.1.139 or later
# Workspace trust dialog must be accepted
/goal All tests pass and CI pipeline is green
# Claude keeps working autonomously until the condition is met
# Pair with auto mode for fully unattended runs:
claude --auto
# auto mode approves tool calls within a turn
# /goal starts the next turn automatically
# Headless / scheduled execution:
claude -p "<goal description>"
Claude Code-Specific Strengths
- Real-time overlay panel: live elapsed/turns/tokens — highest transparency of the three platforms
- Hooks system integration: evaluator is deeply wired into the existing hooks architecture; highly customizable
- Three run modes: interactive,
-p(headless), and Remote Control for CI/CD or remote servers - Agent View (Research Preview): a single list of every Claude Code session — running, blocked, or done
- MCP integration: natively extensible via Model Context Protocol
tab: source — Claude Code v2.1.139 release notes: https://releasebot.io/updates/anthropic/claude-code | MindStudio guide: https://www.mindstudio.ai/blog/claude-code-goal-command-autonomous-tasks | Field guide with examples: https://medium.com/@jason.croucher/claude-code-goal-a-field-guide-with-games-f6f3b617ce5b
3. Platform Comparison at a Glance
| Dimension | Codex /goal | Hermes /goal | Claude Code /goal |
|---|---|---|---|
| Role | Coding execution engine | Multi-agent orchestrator | Verification-driven code agent |
| Released | April 2026 | May 7, 2026 (v0.13.0) | May 12, 2026 (v2.1.139) |
| Evaluator | State machine + model self-eval | Independent Judge Model | Independent lightweight model (Hooks) |
| State persistence | ✅ Database, cross-session | ✅ Persistent, cross-session | ⚠️ In-session; restart needed after close |
| Turn budget | Token budget (configurable) | Default 20 turns (configurable) | No hard cap (set in condition or use Ctrl+C) |
| Interface | CLI / IDE / Desktop App | CLI + messaging platforms (Telegram etc.) | Interactive / -p / Remote Control |
| Best at | Long implementation runs, env switching | Multi-agent coordination, cross-tool handoffs | Code review, test repair, CI integration |
| Key integrations | AWS Bedrock, external agent import | Codex CLI, Claude Code, Kanban board | MCP, CI/CD pipelines, Hooks system |
4. Handoff Recommendations Across Platforms
4.1 The Core Principle: Hermes Directs, Codex Builds, Claude Code Verifies
All three platforms share the same command format — intentionally. This makes them composable. The most powerful workflow chains them:
You → Hermes (/goal + high-level objective)
│
├──→ Codex (implement the feature, write code)
│
├──→ Claude Code (audit for bugs, verify tests)
│
└──→ Hermes (verify + send summary report) → You
You never open a terminal. One message in. One summary out.
4.2 Scenario-Based Handoff Guide
Scenario A: New Feature Development
Recommended: Hermes /goal → Codex builds → Claude Code reviews
# Send to Hermes (via Telegram / CLI)
/goal Add OAuth2 login to the user module
Constraints:
- Use existing User database schema
- Include unit tests and integration tests
- Done when all tests pass
Hermes auto-decomposes this into Kanban tasks, assigns implementation to Codex, then routes the result to Claude Code for security review.
Scenario B: Fixing a Broken CI Pipeline
Recommended: Claude Code /goal directly
/goal All CI tests pass
# Claude Code will keep diagnosing and patching
# failures one by one until the suite is clean
Claude Code's strength is finding code that looks right but isn't — ideal for tracking down flaky tests and subtle regressions.
Scenario C: Overnight Long-Running Tasks
Recommended: Codex Goal Mode (macOS desktop app)
Codex supports Remote Computer Use: it keeps running even when your Mac screen locks. Pair with Codex Mobile for remote status monitoring.
# Set the goal, then safely lock your screen
/goal Migrate entire /src/legacy directory to the new module architecture
Constraints:
- All public API interfaces remain unchanged
- Each module must have corresponding tests
- Done when all original tests still pass
Scenario D: Recurring Maintenance (Weekly Tasks)
Recommended: Hermes (self-hosted + messaging platform trigger)
Hermes is built for recurring engineering workflows that span multiple coding sessions — e.g., a weekly goal to "sync dependency versions" or "clear stale TODOs". Trigger via Telegram message, get a summary in the same chat.
4.3 Universal Principles for Writing Good /goal Prompts
Regardless of platform, a strong /goal has these traits:
1. Completion condition must be verifiable
- ✅
"All tests pass"(measurable) - ✅
"Build exit code is 0"(measurable) - ❌
"The code looks cleaner"(not verifiable)
2. State constraints explicitly, not just goals
Tell the agent what must not change:
- Keep schema unchanged
- Keep Lighthouse score above 90
- Keep existing UI intact
3. Set a budget for long runs
- Hermes: configure
turn_budget - Codex: configure
token_budget - Claude Code: write a turn limit into the condition itself, or be ready to
Ctrl+C
4. Always run inside a Git repo
Run git init before starting. Agents can change many files quickly. Being able to git diff or git checkout is your most important safety net.
5. Never leave an open-ended goal running unattended overnight
Even with a budget, maintain monitoring habits. "Optimize the entire codebase" is not a goal — it has no stopping condition.
5. Industry Significance
Three teams at three different companies converged on the same primitive within six weeks. That convergence is the signal.
As VentureBeat reported, separating the builder from the judge is "sound design — fundamentally, you can't trust a model to judge its own homework." The appearance of independent evaluators in agent loops marks a meaningful shift toward auditable, observable agentic systems.
The deeper pattern, noted by multiple analysts: all three accept the same command format, which means they compose. For the first time, a single message to Hermes can trigger Codex to build, Claude Code to review, and Hermes to verify — with no terminal interaction from the developer. /goal is the shared protocol that makes multi-agent pipelines practical.
The role of the developer is shifting from step-by-step supervisor to outcome definer — and /goal is the interface that makes that shift concrete.