Memory · Free preview

The Context Window

A finite desk

A context window is a finite token budget — to stay inside it you keep the most recent messages that fit and drop the oldest.

A finite desk

On turn forty of a long chat, your agent suddenly forgets the user's name — or the API rejects the request outright with "too many tokens." Nothing broke in your code. You simply tried to put more on the desk than the desk can hold. Every model has a hard limit, and once the conversation crosses it, something has to come off.

The context window is that desk: a fixed amount of room measured in tokens. Everything the model reads on a turn — the system prompt, every past message, the tool results — has to lie on the desk at once, because the model has no memory between calls. It only knows what you hand it this turn. So when the history grows past the budget, you don't get to keep it all; you choose. And the cheap, reliable heuristic is recency: the newest messages usually carry the live intent, so you keep the newest that fit and slide the oldest off the edge.

Concretely, imagine a 100-token desk and four messages: a 30-token system line, a 20-token question, a 25-token reply, and a fresh 40-token question. Walking newest → oldest, you take the 40, then the 25 (65), then the 20 (85) — and the 30-token system line would push you to 115, so it falls off. You kept 3 of 4 and spent 85/100. The most recent turn survives; the stale instruction is sacrificed.

This is the first lever every agent pulls, and getting it wrong is expensive both ways: keep too much and the request errors out or costs more; drop too aggressively and the agent loses the thread. The later lessons in this track all exist because plain keep/drop is lossy — but you have to master the budget before you can be clever about it.

Below is exactly that conversation: four messages whose tokens sum past the 100-token window. Walk from the newest backward, keeping each only while the running total still fits the budget, and print which messages stay, which get dropped, and the used/budget total. Done means used never exceeds 100 and the oldest message reads DROP.

Context isn't free or infinite. Fitting the window — deciding what to keep on the desk and what to let fall away — is the first real memory skill.

In the full academy, you write and run this — live, graded:

// The conversation so far — oldest message first. Each has a token cost.
const messages = [
  { text: "system: always be concise", tokens: 30 },
  { text: "user: what's the capital of France", tokens: 20 },
  { text: "assistant: it's Paris", tokens: 25 },
  { text: "user: and the weather there today", tokens: 40 },
];
const budget = 100; // the context window is this many tokens — no more.

// Decide which messages survive. Walk NEWEST → OLDEST and keep a message only
// while the running total still fits inside the budget; drop the rest.
const keep = new Array(messages.length).fill(false);
let used = 0;
for (let i = messages.length - 1; i >= 0; i--) {
  // TODO: keep messages[i] only if (used + its tokens) still fits the budget,
  // and add its tokens to 'used' when you keep it.
  keep[i] = true;
  used += messages[i].tokens;
}

// Report the decision in chronological order, then the 

🔒 Live code execution, real agent runs, mastery tracking and verifiable credentials unlock with the full academy.

This is 1 of 50 lessons.

The full academy: write real code, watch real agents run, and earn verifiable credentials — across 8 tracks, in a 3D campus.

Unlock the full academy — $100 →

14-day refund · 🔒 Stripe-secured checkout · lifetime access

More free lessons: An LLM Is a Function  ·  The Agent Loop  ·  Define a Tool  ·  Give an Agent a Tool  ·  Durable State

← The Agent Marketplace