A model can only talk
Ask a bare model "what's the weather in Tokyo right now?" and it will answer with total confidence — "It's 18°C and partly cloudy." It made that up. The model has no window, no thermometer, no live feed; it produced the words that most often follow that question in its training. Ask "what is 48291 × 7732?" and it will write a number that looks right and is almost certainly wrong, because it pattern-matches digits instead of multiplying them.
That's the failure tools exist to fix. A language model is a text function: given words, it returns likely next words. For timeless knowledge — who wrote Hamlet, why the sky is blue — that's enough, because the answer was baked in and never changes. But the model has no clock, no calculator, no inbox. The instant a request needs live data, exact arithmetic, or an action in the world, text prediction can only produce a plausible-sounding guess. A tool is what actually goes and gets the real value — or performs the real action.
A guessed weather reading or hallucinated total isn't just wrong, it's confidently wrong, and a user can't tell the difference. The first skill of agent design is spotting which requests fall past the model's edge — every tool in this track exists to cover it.
Below is a list of user requests. Finish classify so it returns "needs-tool" for the
ones that reach past frozen knowledge — live data ("weather… right now", "current
price of Bitcoin"), exact arithmetic, real-world actions ("send an email") — and
"answerable" for the timeless facts. Print <request> -> <verdict> for each. Done is
when every line carries the verdict that matches its real signal.
A tool is not a feature you bolt on for fun — it is the answer to a request the model literally cannot satisfy with words alone.