// How it works

From a CLI to a fleet of agents, in five steps.

Configure the CLI, point it at your stack, prove quality with evals, and deploy. From there the Command Center runs the whole fleet — every agent observable, recoverable, and updatable.

how-it-works ~ overview

5 steps

01 →

Configure

CLI + endpoints for auth

02 →

Setup

auth, MCPs, skills & AGENTS.md

03 →

Create evals

prompts, expectations, fixtures

04 →

Deploy

one line, the whole fleet

05 ▣

Command Center

operate at fleet scale

01// Configure

Install the CLI.
Point it at your stack.

One CLI is the control surface for everything that follows. Install it, log in, and register the endpoints your agents authenticate and talk through — the same endpoints they'll use in production.

→Authenticated in one step — no keys to copy-paste.

→Register your auth and delivery endpoints once; every command reuses them.

→Works the same on a laptop and in CI.

chariot ~ configure

authenticated

$ brew install chariot

$ chariot login

✓ authenticated as platform@awesomeapp.io

$ chariot endpoints set \

--auth "https://awesomeapp.io/auth" \

--deliver "https://awesomeapp.io/chariot"

✓ endpoints registered · ready

02// Setup

Wire up data, tools,
and the agent itself.

An agent is its instructions, its skills, and the data it can reach. Define all three in version-controlled files — global data shared across the fleet, per-user data scoped to each subject, plus the skills and AGENTS.md that shape behavior.

→Chariot auth — scoped tokens, per subject, never shared.

→An MCP for global data, and a per-user MCP link for private data.

→Skills and an AGENTS.md that define how the agent works.

agent ~ setup

4 parts

global MCP · shareduser MCP · per-subject

AGENTS.mdagent instructions

skills/refund.skill.md · escalate.skill.md

mcp.globalcatalog · pricing (shared)

mcp.user{{subject}} orders · prefs

authchariot oauth · scoped tokens

↳Global data is shared; per-user data never crosses subjects.

03// Create evals

Prove it before
it ships.

An eval is a prompt plus what you expect back. Check the output objectively with a grep, subjectively with an LLM judge, and assert the exact tool calls the agent should make. Run it against faked MCP fixtures or your real MCP — your choice, per eval.

→Expected output, checked by grep (objective) or LLM judge (subjective).

→Assert the tool calls the agent must — and must not — make.

→Test data from a fixture (faked MCP) or against your real MCP.

evals/refund-request.eval

1 scenario

prompt

"Customer asks for a refund on order #4471"

expect

→tool_callslookup_order, issue_refund

→grep/refund of \$\d+/objective

→judge"polite, confirms amount"llm

fixturesmock-mcp · orders.fixture.json

✓ 31 / 31 passing · quality 98.7

04// Deploy

One line.
The whole fleet.

When the evals pass, ship. A single command provisions every agent warm, opens a two-way channel to your backend, and hands you a live dashboard — no queue, no cold start, no per-agent setup.

→Thousands of agents live the moment the command returns.

→Your backend connected from line one — fully bidirectional.

→The Command Center spins up with the same command.

chariot ~ deploy

$ chariot deploy --count 9999 --endpoint "https://awesomeapp.io/chariot" --token-seed "ts_1234567890abcdef"

✓ 10,000 agents live · channel open · dashboard ready

05// Command Center

The operator layer for thousands of persistent agents.

Once agents are live, the Command Center is where you run them — the control plane that makes a persistent fleet observable, recoverable, and updatable, without treating each agent like a bespoke snowflake.

// tracked for every agent

identitystatusowner / subjectruntime versionwake / sleeprecent taskstool callsdelivery eventslatencyfailurescost

// at fleet scale, the control plane answers

?which agents are healthy?

?which agents are sleeping?

?which agents are stuck?

?which agents are expensive?

?which agents are safe to roll out?

→Investigate no-reply and failed-tool cases — replay what happened.

→Compare versions, stage changes, and run eval-backed rollouts.

→Make targeted or fleet-wide updates safely.

command/fleet.live

live

agents

10,000 · 99.1% healthy · 71 sleeping

agent #a4471no-reply

subjectuser_8821

versionv2.3

last taskrefund #4471

latency1.2s

last toolissue_refund ✕

cost / mo$2.84

>rollout v2.4 → fleet --require evalsSTAGE

Observable

Every agent is debuggable without SSH-ing into a pod — identity, state, recent tasks, tool calls, and cost, all on one pane.

Recoverable

Find the no-reply and failed-tool cases, replay what happened, and bring wedged or sleeping agents back without touching the box.

Updatable

Compare versions, stage a change, prove it with an eval-backed rollout, then ship it to one agent or the whole fleet — safely.

Five steps from your stack
to a fleet you can run.

Configure, set up, prove with evals, deploy — and operate every agent from one console. See it on your data, with your tools.

Book a demo Explore the platform

From a CLI to a fleet of agents, in five steps.

Install the CLI.Point it at your stack.

Wire up data, tools,and the agent itself.

Prove it beforeit ships.