Foundations 3 min

What Are Tokens?

Tokens are the atomic units LLMs read and write. Learn why tokenisation matters for cost, context limits, and model behaviour.

Last updated July 3, 2026

In the last lesson we said LLMs predict the next token. But we skipped over what a token actually is. This lesson fixes that - because tokens are not words, they are not characters, and the distinction causes real, practical problems if you do not understand it.

How a Chef reads a recipe vs how a model reads text

A chef reads a recipe word by word and understands each ingredient as a whole concept. A tokeniser works more like a printing press with pre-cut letter blocks - it breaks text into the most efficient reusable chunks from a fixed vocabulary of about 50,000 pieces. Common words like 'the' or 'run' get their own block. Rare or long words get chopped into multiple smaller blocks. The word 'Transformers' might be ['Trans', 'formers']. The word 'Unbelievable' might be ['Un', 'believ', 'able']. An emoji like 🚀 might consume 3 tokens. A date like '2026-06-06' might become ['2026', '-', '06', '-', '06'] - five tokens for what you think of as one value.

Real tokenisation - exactly what the model sees

'hello world' → 2 tokens
'Unbelievable' → 3 tokens: ['Un', 'believ', 'able']
'ChatGPT' → 3 tokens: ['Chat', 'G', 'PT']
'192.168.1.1' → 7 tokens (each number and dot is separate)
100 English words → ~130–150 tokens on average

Rule of thumb: 1 token ≈ 4 characters of English text. Non-English languages are often less efficient - the same sentence in Hindi or Arabic can cost 2–3× more tokens than English.

Input Text

Characters248

Tokens136

Efficiency1.8 chars/tok

Tokenized Output

SpaceWordSubwordByte

Hey

␣

kno

␣

bor

ing

␣

learn

␣

wer

␣

pro

␣

wou

␣

mak

␣

our

␣

ive

␣

eas

␣

but

␣

now

␣

for

␣

stu

␣

how

␣

token

ize

␣

wor

␣

Hon

est

␣

why

␣

are

␣

learning

␣

thi

␣

ste

␣

chi

ing

␣

Esp

eci

all

␣

sin

␣

ing

␣

tak

␣

our

␣

job

␣

any

way

Why this hits your wallet. Every API call to a language model is billed per token - input tokens and output tokens separately. GPT-4o charges roughly $2.50 per million input tokens and $10 per million output tokens. If you send a 50-page PDF (about 25,000 words ≈ 33,000 tokens) to the model in every request, you are spending real money on context that may be 90% irrelevant to the question being asked. This is exactly why RAG exists - instead of dumping the whole document, you retrieve only the 3–5 most relevant chunks.

python

import tiktoken  # pip install tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

texts = [
    "hello world",
    "Retrieval-Augmented Generation",
    "2026-06-06",
    "192.168.1.1",
]

for text in texts:
    tokens = enc.encode(text)
    decoded = [enc.decode([t]) for t in tokens]
    print(f"{text!r:40} → {len(tokens)} tokens: {decoded}")

# Output:
# 'hello world'                            → 2 tokens: ['hello', ' world']
# 'Retrieval-Augmented Generation'         → 4 tokens: ['Retrieval', '-Aug', 'mented', ' Generation']
# '2026-06-06'                             → 5 tokens: ['2026', '-', '06', '-', '06']
# '192.168.1.1'                            → 7 tokens: ['192', '.', '168', '.', '1', '.', '1']

The context window is a token budget, not a word count. When a model says it has a 128K context window, that means 128,000 tokens of combined space for your system prompt, conversation history, retrieved documents, and the response. Spend 80K on an irrelevant document and you have 48K left. Understanding tokens helps you make deliberate decisions about what to include and what to leave out.

What's next

Now that you know tokens are the unit of input, the natural question is: what does the model do with all those tokens it received during training? And what is the difference between that massive training process and the quick response you get when you ask a question? That is the Training vs Inference lesson.