Think on paper
Ask an agent to do a multi-step task in one shot — "compute the tax, apply the discount, then split three ways" — and it will often hand you a single confident number that's quietly wrong. You can't see where it went off the rails, because there are no rails: the whole computation happened invisibly inside one model call. There's nothing to inspect, nothing to retry, nothing to trust.
In the last lesson you fit a conversation onto the finite desk. Working memory is what you do with that desk during a single task: a scratchpad the agent writes each intermediate result to, then reads back for the next step. The intuition is the same one that makes you do long division on paper instead of in your head — each step becomes small, named, and visible, so a slip shows up at the exact line it happened. The agent stops making one opaque leap and starts leaving a trace.
Take (a + b) * c with a=7, b=3, c=4. Instead of emitting 40 and hoping, the
agent writes scratchpad.sum = 10, logs it, then reads that back to compute
scratchpad.product = 40, logs that, and reports the last value on the pad as the
answer. If the product had come out as 30, you'd instantly see sum was fine and
the multiply was the bug — diagnosis without re-running anything.
This matters because real agent tasks are long chains — search, then parse, then decide — and a chain with no intermediate state can only fail as a black box. A scratchpad turns failures into located failures, and lets a step re-run from the last good value instead of from scratch.
Here the task is (a + b) * c. The starter cheats and jumps straight to the
answer, so there's no trace to trust. Rewrite it to write the sum to
scratchpad.sum and log it, read it back to compute scratchpad.product and log
that, then report the product as the answer. Done means all three lines appear, in
order, each value read from the pad rather than recomputed inline.
The scratchpad isn't extra work — it's what turns one impossible leap into a chain of steps you can check, retry, and build on.