Never trust the caller
Your agent is live. A request arrives with no text field at all, and the very
first line that touches req.text.length throws Cannot read properties of undefined — the agent is down, and the caller never even reached a tool. Another
request asks for type: "delete", an action you never built a handler for, and the
model improvises something destructive. A third pastes a 4,000-character wall of
text that balloons your token bill. None of these are clever attacks; they're just
the ordinary garbage that hits any open endpoint.
An input guardrail is the bouncer you run before the agent acts. It checks
each request against explicit rules — required field present, allowed type, within
length — and turns away the ones that fail with a reason, so the caller knows
what to fix and the agent never sees the bad request. The intuition: validation is
cheaper and safer than recovery. Catching a malformed request at the door costs one
if; catching it after the agent has half-acted costs a rollback.
Order is the subtle part. Consider { type: "search" } with no text. If you
check length first, you read undefined.length and crash — on the exact input
the guard existed to stop. So you check existence, then type, then length, and
you return the first failure rather than collecting all of them. Walking the five
requests, you'd see ACCEPT, then REJECT: missing text, then REJECT: type not allowed, then REJECT: text too long, then ACCEPT — each rejection naming
exactly which rule tripped.
This is the first guardrail in the track: everything later (resisting injection,
redacting output, gating risky actions) assumes the request that got this far is at
least well-formed. Build validate(req): enforce the three rules in order — a
required text field, an allowed type (search or lookup), a max length of 20
— returning the first failure as "REJECT: <reason>", or "ACCEPT" when all pass.
A guard that trusts its input isn't a guard. Reject early, check existence before you measure it, and name the reason — the agent never sees the bad request.