Foundations 3 min

The Context Window Explained

Why LLMs forget, what context limits mean in practice, and how to design around them.

You now know that models process tokens and that the context window is measured in them. But what exactly is the context window? It is the model's working memory - the total space available for everything in a single request: your instructions, the conversation history, any documents you have attached, and the response being generated. Once you exceed it, the model starts losing information. Not gracefully. Not intelligently. It just drops.

The surgeon who forgets everything before the operation

Imagine a brilliant surgeon who, at the start of every operation, is handed a clipboard with all the patient's information - allergies, medical history, the procedure plan, notes from earlier that day. The clipboard holds exactly 20 pages. If the stack of notes is 25 pages thick, the assistant has to leave 5 pages behind. Which 5? The oldest ones. The surgeon performs the operation with whatever is on the clipboard - missing information they do not know is missing. This is exactly how the context window works. The model reasons over whatever fits. It does not know what was cut. It will not ask for it. It will just silently work with incomplete information and still sound confident.

How context degrades in long conversations. Most chat applications prepend the full conversation history with every new message, so the model has context for what was said earlier. But as the conversation grows, older messages get pushed out to make room. The model might forget an instruction you gave in message 3 by the time you reach message 40. This is why long-running conversations with complex instructions sometimes go wrong - not because the model got dumber, but because it can no longer see the early context that shaped its behaviour.

The lost-in-the-middle problem
Research has shown that LLMs are significantly better at using information at the beginning and end of the context window than information buried in the middle. If you have 10 retrieved documents and the most relevant one lands in position 5–7, the model is more likely to miss it than if it were first or last. This is not a bug you can patch with better prompts - it is a consequence of how attention weights distribute over long sequences. The practical fix: put your most important context at the top, not buried in the middle.

Modern Context Window Sizes (2026). GPT-4o and Claude 3.5/4: 200K tokens (~150,000 words). Gemini 2.0 Pro: 2 million tokens (~1.5 million words - a full library of books). These numbers are massive, but larger windows do not eliminate the 'lost-in-the-middle' problem, and they cost significantly more per request. A well-designed retrieval system that sends 3,000 relevant tokens almost always beats dumping 100,000 tokens of vaguely related content.

Interactive Attention Weight Visualizer
Lost in the Middle Simulator

Drag the slider to move a piece of key information (like a passcode or rule) through a long prompt. See how the model's focus decreases in the middle.

Prompt Start (0%)Prompt Middle (50%)Prompt End (100%)
35%Attention
Lost in the Middle!
Middle of Prompt (Lost in the Middle)

Information buried here is at high risk of being ignored or missed. Due to how transformer self-attention calculations scale, attention weights naturally decay in the middle of long contexts.

⚠️ Warning: Never bury important instructions or single key-value database records in the middle of a 100k+ token prompt.
Practical rules for managing context
1. Keep system prompts under 500 tokens - be ruthless about what instructions actually change behaviour.
2. Retrieve, don't dump - send the 3–5 most relevant chunks, not the whole document.
3. Summarise long conversations - every 10–15 turns, compress earlier history into a short summary and keep that instead of the raw transcript.
4. Monitor token usage - log how many tokens each request consumes. Budget overruns are usually the cause of mysterious model behaviour.
5. Put critical context first - never bury the most important instruction in the middle of a long prompt.
What's next
The context window holds tokens - and among those tokens are often embeddings used to find the right content to include. Understanding how text becomes a meaningful number is what the next lesson, Embeddings: Meaning as Numbers, is all about.