Safety & Deploy · Free preview

Human in the Loop

High stakes need approval

An approval gate classifies each proposed action by risk and pauses the high-stakes ones for a human to approve, while letting low-risk actions run automatically.

High stakes need approval

The guardrails so far validate, sanitize, and redact text. But your agent doesn't just talk — it acts. It calls tools that move money, delete records, send wires. And a model is wrong often enough that "trust it to never misfire on an irreversible action" is not a plan. Picture the agent confidently deciding to delete-account for the wrong user, or reading "refund the customer" and issuing $500 instead of $5, or send-money $200 to an address it half-hallucinated. The text guardrails wave all of these straight through, because the output looks perfectly clean — the danger isn't in the words, it's in the consequence.

The fix isn't to make the agent timid; it's to put a human in the loop for the actions that can't be taken back. You classify each proposed action by risk: the reversible, low-stakes ones run on their own, and the irreversible or expensive ones pause with NEEDS APPROVAL until a person signs off. The intuition is asymmetry of regret — auto-approving a $5 refund that turns out wrong costs five dollars and a moment; auto-approving a delete-account that turns out wrong can't be undone at any price. So you spend a human's attention only where being wrong is expensive.

Notice the threshold, not just the category. delete-account and send-money are always gated — there's no safe version. But refund is gated by amount: over $100 needs a human, while the $5 refund sails through on ALLOW. That's the difference between a blanket block (which makes the agent useless) and risk-tiering (which keeps it fast where it's safe). Walking the queue, you'd see read-record -> ALLOW, refund $5 -> ALLOW, then refund $500, delete-account, and send-money $200 all routed to a human.

Finish gate(action): return "NEEDS APPROVAL" for every delete-account, every send-money, and any refund over $100; return "ALLOW" for everything else, including the $5 refund. Print one "<label> -> <decision>" line per action.

Automate the reversible; gate the irreversible. The point of a human in the loop is to be there exactly when being wrong is expensive — and nowhere else.

In the full academy, you write and run this — live, graded:

// You guard a deployed agent. It proposes actions; you decide which can run on
// their own and which must PAUSE for a human. The rule for high-stakes actions:
//   - "delete-account"            -> always NEEDS APPROVAL (destroys data)
//   - "send-money"                -> always NEEDS APPROVAL (moves money out)
//   - "refund" OVER $100          -> NEEDS APPROVAL (a small refund is fine)
// Everything else auto-ALLOWs. Print "<label> -> <ALLOW|NEEDS APPROVAL>".
const REFUND_LIMIT = 100;

function gate(action) {
  // TODO: return "NEEDS APPROVAL" for the high-risk cases above,
  //       otherwise return "ALLOW". Right now it rubber-stamps everything.
  return "ALLOW";
}

const proposed = [
  { name: "read-record",    amount: 0,   label: "read-record" },
  { name: "refund",         amount: 5,   label: "refund $5" },
  { name: "refund",         amount: 500, label: "refund $500" },
  { na

🔒 Live code execution, real agent runs, mastery tracking and verifiable credentials unlock with the full academy.

This is 1 of 50 lessons.

The full academy: write real code, watch real agents run, and earn verifiable credentials — across 8 tracks, in a 3D campus.

Unlock the full academy — $100 →

14-day refund · 🔒 Stripe-secured checkout · lifetime access

More free lessons: An LLM Is a Function  ·  The Agent Loop  ·  Define a Tool  ·  Give an Agent a Tool  ·  Durable State

← The Agent Marketplace