RAG
6 / 7
RAG 3 min

Hybrid Search: Dense and Sparse Together

Combine keyword search (BM25) with semantic search to get the best of both approaches.

To achieve maximum search accuracy, production RAG systems run lexical search (BM25) and semantic vector search in parallel. Lexical search matches exact terms, while semantic search matches conceptual meanings. Running both in parallel is called Hybrid Search.

Two detectives on the same case

Picture two detectives investigating the same case, working independently. Detective BM25 matches fingerprints and exact physical evidence, she only trusts what's literally there. Detective Vector reads behaviour patterns and motive, she trusts what conceptually fits even without a literal match. Each one hands the captain a ranked list of suspects. The captain (Reciprocal Rank Fusion) doesn't care about each detective's internal confidence scores, those aren't even measured on the same scale. She only cares about *rank*: a suspect both detectives place near the top of their list gets promoted above a suspect only one detective flagged. That's exactly what RRF does with search results.

The score merging problem. BM25 outputs frequency-based scores ranging from 0 to 20+. Vector search outputs cosine similarity scores between 0 and 1. You cannot directly sum or average these scores. To merge them, we use Reciprocal Rank Fusion (RRF). RRF ignores the raw scores entirely and blends the results based solely on their rank positions in each list.

$RRF\_Score(d) = \sum_{m \in M} \frac{1}{rank_m(d) + K}$

Where $M$ is the set of retrieval methods (BM25 and Vector), $rank_m(d)$ is the position of document $d$ in method $m$, and $K$ is a smoothing constant (standardized at `60`).

Interactive: Hybrid Search & RRF Fusion
RAG Retrieval

Pick a query and watch two independent rankings get merged into one, using only rank position, never the raw scores.

Keyword Search (BM25)
#1Exporting Data to CSV
#2Resetting Your Password
#3Managing Team Permissions
#4Subscription Tiers Explained
#5Billing, Invoices & Refunds
Semantic Search (Vector)
#1Billing, Invoices & Refunds
#2Managing Team Permissions
#3API Rate Limits (429 Errors)
#4Resetting Your Password
#5Exporting Data to CSV
Lower K makes top ranks matter much more. Higher K flattens the difference between ranks.

No word in this question literally appears in the refunds article, so BM25 buries it at rank 5. Vector search recognises 'money back' means refund and ranks it first.

Fused Result (Reciprocal Rank Fusion)
#1Managing Team Permissions
found by bothBM25 #3Vector #20.0320
#2Exporting Data to CSV
found by bothBM25 #1Vector #50.0318
#3Billing, Invoices & Refunds
found by bothBM25 #5Vector #10.0318
#4Resetting Your Password
found by bothBM25 #2Vector #40.0318
#5API Rate Limits (429 Errors)
BM25 -Vector #30.0159
#6Subscription Tiers Explained
BM25 #4Vector -0.0156
Our Project Implementation: Hybrid RRF and ACL Filtering
In the Smart File Cabinet retrieval pipeline (`rag/enterprise-search/retrieval.py`), we retrieve candidate lists from BM25 and Qdrant in parallel. We perform user ACL credential verification at the application layer, discard unauthorized chunks, and merge the remaining matches using RRF ($K=60$). Only top-5 validated chunks are compiled into the final LLM prompt.
python
def rrf_merge(bm25_ids: list[str], vector_ids: list[str]) -> list[str]:
    """Score chunks using Reciprocal Rank Fusion (RRF_K = 60)."""
    scores = defaultdict(float)
    for rank, cid in enumerate(bm25_ids):
        scores[cid] += 1.0 / (rank + 60)
    for rank, cid in enumerate(vector_ids):
        scores[cid] += 1.0 / (rank + 60)
    return sorted(scores, key=lambda c: scores[c], reverse=True)

def hybrid_search(query: str, tenant_id: str, user_id: str):
    # Retrieve candidates using BM25 and Qdrant filtered by ACL rules
    bm25_ids  = bm25_retrieve(query, tenant_id, user_id)
    vec_ids   = vector_retrieve(query, tenant_id, user_id)
    # Merge using Reciprocal Rank Fusion
    fused_ids = rrf_merge(bm25_ids, vec_ids)[:5]
    return [chunk_store[cid] for cid in fused_ids if cid in chunk_store]

Applying dynamic user checks before merging prevents authorization leaks at retrieval time.

What's next
RRF gives you a solid fused list fast, but it's still a rough sort based only on rank position, not on how well each chunk actually answers the question. The final quality lever is a slower, smarter second pass. That's Re-ranking: Quality Over Quantity, next.