Foundations 4 min

Embeddings: Meaning as Numbers

Understand how text gets converted into vectors and why the distance between those vectors is what makes semantic search possible.

Last updated July 3, 2026

We have talked about tokens as the model's unit of input. But when we want to find relevant content - to answer a question, to retrieve a document, to match a query - we need a different representation. Not tokens. Embeddings.

An embedding is a list of floating-point numbers (a vector) that encodes the meaning of a piece of text. The critical property: texts with similar meanings produce vectors that are mathematically close to each other, even if they share no words.

The city map where streets are grouped by vibe, not location

Imagine a city map where streets are not arranged geographically but by *meaning*. 'Hospital', 'clinic', 'emergency room', and 'pharmacy' are all clustered together in one neighbourhood. 'JavaScript', 'Python', and 'TypeScript' form their own district. 'King', 'Queen', 'Prince', and 'Emperor' are neighbours - but far from 'database' or 'algorithm'. Embeddings are that map - except instead of two dimensions, you are working with 768, 1,536, or 3,072 dimensions depending on the model. You cannot visualise it, but the geometry is real. The phrase 'What does ML cost?' and the phrase 'machine learning pricing' will land very close together on that map, even though they share zero words.

How similarity is measured: cosine similarity. To compare two embeddings, we measure the cosine of the angle between their vectors. A score of 1.0 means identical direction - same meaning. 0.0 means completely unrelated. -1.0 means opposite meaning (rare in practice). In real retrieval systems, you are typically looking for chunks with a cosine similarity above 0.75 to your query to consider them relevant.

Understanding Embedding Arithmetic (Word Math)

Why can we subtract and add words like numbers?

Because embeddings represent words as lists of characteristics (dimensions). Imagine we rate words on three scales: [Royalty, Masculinity, Femininity]:

'King' = [1.0, 1.0, 0.0] (has Royalty and Masculinity)*
'Man' = [0.0, 1.0, 0.0] (has only Masculinity)*
'Woman' = [0.0, 0.0, 1.0] (has only Femininity)*

Now, let's do the math:
1. Start with 'King' = `[1.0, 1.0, 0.0]`
2. Subtract 'Man' = `- [0.0, 1.0, 0.0]`
This leaves us with `[1.0, 0.0, 0.0]` (Royalty concept, gender stripped!)*
3. Add 'Woman' = `+ [0.0, 0.0, 1.0]`
This results in `[1.0, 0.0, 1.0]` (Royalty + Femininity)*

Mathematically, the closest word to `[1.0, 0.0, 1.0]` in our coordinate vocabulary is 'Queen'! You can test this exact arithmetic path in the interactive vector space below.

Word Vector Math

Embedding models represent semantic relationships as dimensions. Try the classic arithmetic:

1. Start with Word

2. Subtract concept

3. Add concept

Projected Embedding Coordinates (2D t-SNE)

RoyaltyTechnologyFruitAction

python

import numpy as np
import google.generativeai as genai

genai.configure(api_key="YOUR_GOOGLE_API_KEY")  # free at aistudio.google.com/apikey

def embed(text: str) -> list[float]:
    result = genai.embed_content(
        model="models/text-embedding-004",
        content=text,
        task_type="retrieval_query",
    )
    return result["embedding"]

def cosine_similarity(a: list[float], b: list[float]) -> float:
    a, b = np.array(a), np.array(b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

# These share no words - but are semantically very close
q1 = embed("What does machine learning cost?")
q2 = embed("pricing for ML APIs")
q3 = embed("What is the capital of France?")

print(f"ML cost vs ML pricing:     {cosine_similarity(q1, q2):.3f}")  # ~0.91
print(f"ML cost vs capital France: {cosine_similarity(q1, q3):.3f}")  # ~0.68

Why embeddings are the engine behind RAG. When a user asks a question, you embed it. You then compare that embedding against the embeddings of every chunk in your knowledge base and return the top-K closest matches. Those matching chunks get injected into the context window as the evidence for the model to answer from. No keyword matching. No search index. Pure semantic proximity.

This is also why RAG can handle paraphrasing gracefully - 'show me revenue numbers' and 'what were the sales figures?' will both retrieve the same quarterly report chunk, because their embeddings are close.

Embeddings vs tokens - two different jobs

Tokens are how the model reads input during generation - a sequence of integer IDs fed through the transformer. Embeddings are how we find relevant content before generation - dense vectors compared by cosine distance. They are produced by different models (an embedding model vs a generation model) and serve completely different purposes. You use both in a production RAG system, but at different stages.

What's next

You now understand what the model reads (tokens), what it holds at once (the context window), and how we find relevant content (embeddings). The next piece of the puzzle: what actually happened during the months of training that gave the model all that knowledge in the first place - and why you cannot just update it on the fly.