Embeddings: Meaning as Numbers
Understand how text gets converted into vectors and why the distance between those vectors is what makes semantic search possible.
We have talked about tokens as the model's unit of input. But when we want to find relevant content - to answer a question, to retrieve a document, to match a query - we need a different representation. Not tokens. Embeddings.
An embedding is a list of floating-point numbers (a vector) that encodes the meaning of a piece of text. The critical property: texts with similar meanings produce vectors that are mathematically close to each other, even if they share no words.
Imagine a city map where streets are not arranged geographically but by *meaning*. 'Hospital', 'clinic', 'emergency room', and 'pharmacy' are all clustered together in one neighbourhood. 'JavaScript', 'Python', and 'TypeScript' form their own district. 'King', 'Queen', 'Prince', and 'Emperor' are neighbours - but far from 'database' or 'algorithm'. Embeddings are that map - except instead of two dimensions, you are working with 768, 1,536, or 3,072 dimensions depending on the model. You cannot visualise it, but the geometry is real. The phrase 'What does ML cost?' and the phrase 'machine learning pricing' will land very close together on that map, even though they share zero words.
How similarity is measured: cosine similarity. To compare two embeddings, we measure the cosine of the angle between their vectors. A score of 1.0 means identical direction - same meaning. 0.0 means completely unrelated. -1.0 means opposite meaning (rare in practice). In real retrieval systems, you are typically looking for chunks with a cosine similarity above 0.75 to your query to consider them relevant.
Because embeddings represent words as lists of characteristics (dimensions). Imagine we rate words on three scales: [Royalty, Masculinity, Femininity]:
'King' = [1.0, 1.0, 0.0] (has Royalty and Masculinity)*
'Man' = [0.0, 1.0, 0.0] (has only Masculinity)*
'Woman' = [0.0, 0.0, 1.0] (has only Femininity)*
Now, let's do the math:
1. Start with 'King' = `[1.0, 1.0, 0.0]`
2. Subtract 'Man' = `- [0.0, 1.0, 0.0]`
This leaves us with `[1.0, 0.0, 0.0]` (Royalty concept, gender stripped!)*
3. Add 'Woman' = `+ [0.0, 0.0, 1.0]`
This results in `[1.0, 0.0, 1.0]` (Royalty + Femininity)*
Mathematically, the closest word to `[1.0, 0.0, 1.0]` in our coordinate vocabulary is 'Queen'! You can test this exact arithmetic path in the interactive vector space below.
Embedding models represent semantic relationships as dimensions. Try the classic arithmetic:
import numpy as np
import google.generativeai as genai
genai.configure(api_key="YOUR_GOOGLE_API_KEY") # free at aistudio.google.com/apikey
def embed(text: str) -> list[float]:
result = genai.embed_content(
model="models/text-embedding-004",
content=text,
task_type="retrieval_query",
)
return result["embedding"]
def cosine_similarity(a: list[float], b: list[float]) -> float:
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
# These share no words - but are semantically very close
q1 = embed("What does machine learning cost?")
q2 = embed("pricing for ML APIs")
q3 = embed("What is the capital of France?")
print(f"ML cost vs ML pricing: {cosine_similarity(q1, q2):.3f}") # ~0.91
print(f"ML cost vs capital France: {cosine_similarity(q1, q3):.3f}") # ~0.68Why embeddings are the engine behind RAG. When a user asks a question, you embed it. You then compare that embedding against the embeddings of every chunk in your knowledge base and return the top-K closest matches. Those matching chunks get injected into the context window as the evidence for the model to answer from. No keyword matching. No search index. Pure semantic proximity.
This is also why RAG can handle paraphrasing gracefully - 'show me revenue numbers' and 'what were the sales figures?' will both retrieve the same quarterly report chunk, because their embeddings are close.