Agents in Production: Patterns and Pitfalls
Real-world architectures, common failure modes, human-in-the-loop design, and keeping costs under control.
Production agents need guardrails around autonomy. Common architecture includes planner, executor, tool sandbox, memory layer, telemetry, and optional human approval for high-risk actions. The three failure modes below account for most real agent incidents. Run each one below, with the guardrail off and then on.
Pick a real failure mode, then toggle the guardrail to see the difference between an agent left to run free and one wrapped in a production safety net.
Press "Run Trace" to see what happens...
Design for recoverability. Every step should be traceable, retryable, and interruptible. Add checkpoints so workflows can resume without rerunning everything from scratch. The `code-review-agent` in the companion repo's human-approval gate (`--post` flag) is a simple version of this: the agent does all the work, but a human decides whether the result actually goes live.
A production agent should feel like a modern autopilot: highly capable, continuously monitored, and always overrideable by a human operator when needed.