Human in the Loop

High stakes need approval

The guardrails so far validate, sanitize, and redact text. But your agent doesn't just talk — it acts. It calls tools that move money, delete records, send wires. And a model is wrong often enough that "trust it to never misfire on an irreversible action" is not a plan. Picture the agent confidently deciding to delete-account for the wrong user, or reading "refund the customer" and issuing $500 instead of $5, or send-money $200 to an address it half-hallucinated. The text guardrails wave all of these straight through, because the output looks perfectly clean — the danger isn't in the words, it's in the consequence.

The fix isn't to make the agent timid; it's to put a human in the loop for the actions that can't be taken back. You classify each proposed action by risk: the reversible, low-stakes ones run on their own, and the irreversible or expensive ones pause with NEEDS APPROVAL until a person signs off. The intuition is asymmetry of regret — auto-approving a $5 refund that turns out wrong costs five dollars and a moment; auto-approving a delete-account that turns out wrong can't be undone at any price. So you spend a human's attention only where being wrong is expensive.

Notice the threshold, not just the category. delete-account and send-money are always gated — there's no safe version. But refund is gated by amount: over $100 needs a human, while the $5 refund sails through on ALLOW. That's the difference between a blanket block (which makes the agent useless) and risk-tiering (which keeps it fast where it's safe). Walking the queue, you'd see read-record -> ALLOW, refund $5 -> ALLOW, then refund $500, delete-account, and send-money $200 all routed to a human.

Finish gate(action): return "NEEDS APPROVAL" for every delete-account, every send-money, and any refund over $100; return "ALLOW" for everything else, including the $5 refund. Print one "<label> -> <decision>" line per action.

Automate the reversible; gate the irreversible. The point of a human in the loop is to be there exactly when being wrong is expensive — and nowhere else.

High stakes need approval

This is 1 of 50 lessons.