RAG
1 / 7
RAG 3 min

Why RAG Exists: The Knowledge Problem

The core limitation of static training data and how retrieval-augmented generation solves it.

Language models are trained on massive static public datasets, baking in a broad, general understanding of language structure, syntax, and reasoning logic. However, their internal knowledge base is frozen at a specific point in time, the training cutoff date. If you ask a vanilla model about events from yesterday, or request information stored inside a private company document cabinet, it will either apologise for not knowing or confidently hallucinate a plausible but incorrect response.

Retrieval-Augmented Generation (RAG) is the architectural pattern designed to solve this frozen-knowledge limitation. Instead of trying to store all possible facts within the model's weights, RAG decouples knowledge storage from language generation.

Open-Book Exam vs. Memorisation

Fine-tuning a model is like studying for an exam weeks in advance: the model memorises broad behavioral styles and domains, but specific factual details can blur. RAG is like taking an open-book exam with a search index at your fingertips: the model doesn't need to memorise anything; instead, it looks up the exact documents it needs right when a question is asked and reads the facts directly from them.

What this looks like in practice. A RAG system keeps two things around: a knowledge base (your documents, split into searchable chunks) and a retriever (something that finds the right chunks for a given question). When a user asks something, the retriever searches the knowledge base, hands the best-matching chunks to the model as evidence, and the model writes its answer grounded in that evidence instead of its own memory. No retraining, no fine-tuning, just a lookup step bolted onto generation.

Our Sibling Projects: Three Retrieval Architectures
To study RAG concepts practically, our sibling codebase `ai-real-world-projects` implements three distinct architectures, matching different system scales:

1. BM25 Keyword Bot (`rag/bm25-keyword-bot/main.py`): Pure keyword search mapping text indices in-memory.
2. Semantic Vector Bot (`rag/semantic-vector-bot/main.py`): Vector semantic search utilizing local CPU embeddings.
3. Smart File Cabinet / Enterprise Search (`rag/enterprise-search/app.py`): A multi-tenant production hybrid pipeline combining BM25 and Qdrant vectors with User Access Control Lists (ACLs).

Choosing the right strategy. For simple applications querying small, static text collections, a local semantic vector search or basic keyword index is highly cost-effective and runs in milliseconds. However, enterprise systems require hybrid search (combining semantics and exact keyword matching) coupled with document-level security rules to prevent user authorization leaks.

What's next
You now know why RAG exists: static training data plus a real-time retrieval step. The next lesson zooms into the actual mechanics: what happens the moment a document is uploaded, and what happens the moment a user asks a question. That's the RAG Pipeline lesson, and it's the one that ties every later lesson in this module together.