Foundations 4 min

How Language Models Actually Work

Demystify LLMs from tokens to next-word prediction, using simple analogies you will actually remember.

Last updated July 3, 2026

Let us play a quick game. Complete this sentence:

'Once upon a...'

Your brain instantly predicted the word 'time'. You did not search a dictionary or look up the history of storytelling. Instead, you matched a pattern you have heard hundreds of times in your life. That is the fundamental concept behind Large Language Models.

The Autocomplete Analogy

Your phone keyboard's autocomplete guesses the next word based on the last few letters you typed. It is a very simple pattern matcher. Now imagine that same autocomplete, except it has read almost all public books, papers, and code repositories on the internet, and has billions of adjustable settings (parameters) to remember those connections. That is an LLM. It is not an 'oracle' that retrieves pre-written answers from a database - it is just a supercharged next-word predictor.

Interactive Walkthrough: How an LLM Generates Text

💬

1. You Talk, It ListensStart with your prompt: 'The cat sat on the...'

✂️

2. Chopping Up WordsBreak text into word-pieces (Tokens)

🧬

3. Finding Word MeaningPlace words on a 'Meaning Map'

🧠

4. Connecting the DotsUse 'Attention' to link related words

📊

5. Guessing the Next WordCalculate probability scores for every word

🎛️

6. Choosing the WinnerUse 'Temperature' to set the creativity level

🔄

7. The Repeat LoopAdd the chosen word and start over

How does it connect words? (The Transformer)
Before 2017, language models read sentences word-by-word, like a person reading left-to-right through a tiny straw. They often forgot the beginning of a long sentence by the time they reached the end. This changed with the Transformer architecture, introduced in the famous paper Attention Is All You Need.

Transformers process the entire sentence at once, using a mechanism called Self-Attention to link related words together, no matter how far apart they are.

Self-Attention in Action

Consider this sentence: 'The bank by the river flooded.'
How do you know bank means a riverbank and not a financial institution? You instantly connected it to the word river. Self-Attention calculates mathematical weights between words so the model understands context in the exact same way.

Temperature: Setting the Creativity
When predicting the next word, the model generates a list of possibilities with probability scores. The Temperature setting controls how we choose from this list:

- Temperature = 0 (Predictable): The model always picks the single highest-scoring word. Excellent for writing code, solving math, or structured JSON output where repetition and consistency are preferred.
- Temperature = 0.7 (Balanced): The model picks proportionally from the most likely options, adding variety. Great for general assistants and essays.
- Temperature = 1.2+ (Creative): The model takes wilder guesses. Fun for brainstorming, but can quickly lead to nonsensical sentences.

LLM Settings

Temperature: 0.70

DeterministicBalancedCreative

Default balanced mode. Blends predictable structures with moderate word variety.

Select Prompt Starter:

next_word_prediction_sandbox.exe

Next Token Candidate Distribution:

97.3%

1.8%

0.7%

0.2%

0.1%

Why LLMs are Stateless
By default, a raw model is completely stateless: it does not remember your past messages, cannot browse the web, and has no idea what today's date is. Every conversation feels like its first. To build modern apps that seem to have memory or access live web search, we wrap the model in systems (like RAG or databases) that feed the necessary context into its prompt with every new request. The base model itself remains a static pattern-matcher.

What's Next?

To predict words, the model first converts them into tokens (word fragments). This is a crucial concept because how a model sees tokens is completely different from how we read words. It directly affects the cost, speed, and behavior of your AI applications. Let's look at tokens next!