RAG
4 / 7
RAG 4 min

Vector-less RAG: BM25 Without a Vector Database

When keyword search beats embeddings - how BM25 works, its tradeoffs against vector DBs, and when to skip the vector store entirely.

Vector databases and embedding models are powerful, but they bring dependency, indexing latency, and financial costs. For localized applications, you can build a highly accurate RAG system using pure lexical (keyword) search. This approach is called vectorless RAG, and it uses the BM25 (Best Matching 25) algorithm.

BM25 builds on TF-IDF by scoring how well a document matches query keywords. It balances term frequency (rewarding documents where a term appears multiple times, with saturation diminishing returns) and inverse document frequency (rewarding rare, specific terms over common words).

The index at the back of a textbook

A vector database is a librarian who understands what your book is *about*. BM25 is the index at the back of a textbook. It doesn't understand meaning at all, it just knows exactly which page every specific word appears on, and how often. Ask it to find 'invoice number INV-2049' or a product SKU, and it will find the exact page instantly, something a meaning-based search can actually struggle with because those strings carry no semantic content to compare. Ask it to find 'a document about billing problems' when the text says 'payment disputes', though, and it comes up empty, because it only ever matches the words that are literally printed on the page.

Our Project Implementation: Pure Lexical RAG
Embeddings maps words into a general semantic space. While it understands synonyms ('cost' matches 'price'), it performs poorly on exact keywords, numbers, serial IDs, or specific product codes. BM25 is ideal for searching precise data. In our BM25 Keyword Bot (`rag/bm25-keyword-bot`), we use Python's `rank_bm25` library to tokenise and index PDFs entirely in memory without any database server installation.
python
import string
from rank_bm25 import BM25Okapi

def _tokenise(text: str) -> list[str]:
    """Lowercase, strip punctuation, split on whitespace: classic BM25 tokens."""
    text = text.lower().translate(str.maketrans("", "", string.punctuation))
    return text.split()

def retrieve(question: str, top_k: int = 5) -> list[dict]:
    """Return top-K chunks ranked by BM25 score."""
    if _bm25 is None or not _corpus:
        return []
    tokens = _tokenise(question)
    scores = _bm25.get_scores(tokens)
    ranked = sorted(zip(scores, _corpus), key=lambda x: x[0], reverse=True)
    return [{**doc, "score": float(score)} for score, doc in ranked[:top_k] if score > 0]

The tradeoff. The weakness of keyword search is synonym blindness. If a user asks 'how do I update my profile?' but your document uses the phrase 'modify account details', BM25 will score it near zero because there are no literal matching keywords. This is why advanced search systems combine BM25 and vector search together, a pattern called hybrid search, covered a couple of lessons from now.

What's next
Whether you retrieve with BM25, vectors, or both, none of it works if the underlying documents were split into bad pieces in the first place. Before we go further, we need to back up one step in the pipeline: how do you actually cut a long document into chunks worth searching? That's Chunking Strategies That Actually Work, next.