Hybrid Search: Dense and Sparse Together
Combine keyword search (BM25) with semantic search to get the best of both approaches.
To achieve maximum search accuracy, production RAG systems run lexical search (BM25) and semantic vector search in parallel. Lexical search matches exact terms, while semantic search matches conceptual meanings. Running both in parallel is called Hybrid Search.
Picture two detectives investigating the same case, working independently. Detective BM25 matches fingerprints and exact physical evidence, she only trusts what's literally there. Detective Vector reads behaviour patterns and motive, she trusts what conceptually fits even without a literal match. Each one hands the captain a ranked list of suspects. The captain (Reciprocal Rank Fusion) doesn't care about each detective's internal confidence scores, those aren't even measured on the same scale. She only cares about *rank*: a suspect both detectives place near the top of their list gets promoted above a suspect only one detective flagged. That's exactly what RRF does with search results.
The score merging problem. BM25 outputs frequency-based scores ranging from 0 to 20+. Vector search outputs cosine similarity scores between 0 and 1. You cannot directly sum or average these scores. To merge them, we use Reciprocal Rank Fusion (RRF). RRF ignores the raw scores entirely and blends the results based solely on their rank positions in each list.
$RRF\_Score(d) = \sum_{m \in M} \frac{1}{rank_m(d) + K}$
Where $M$ is the set of retrieval methods (BM25 and Vector), $rank_m(d)$ is the position of document $d$ in method $m$, and $K$ is a smoothing constant (standardized at `60`).
Pick a query and watch two independent rankings get merged into one, using only rank position, never the raw scores.
No word in this question literally appears in the refunds article, so BM25 buries it at rank 5. Vector search recognises 'money back' means refund and ranks it first.
def rrf_merge(bm25_ids: list[str], vector_ids: list[str]) -> list[str]:
"""Score chunks using Reciprocal Rank Fusion (RRF_K = 60)."""
scores = defaultdict(float)
for rank, cid in enumerate(bm25_ids):
scores[cid] += 1.0 / (rank + 60)
for rank, cid in enumerate(vector_ids):
scores[cid] += 1.0 / (rank + 60)
return sorted(scores, key=lambda c: scores[c], reverse=True)
def hybrid_search(query: str, tenant_id: str, user_id: str):
# Retrieve candidates using BM25 and Qdrant filtered by ACL rules
bm25_ids = bm25_retrieve(query, tenant_id, user_id)
vec_ids = vector_retrieve(query, tenant_id, user_id)
# Merge using Reciprocal Rank Fusion
fused_ids = rrf_merge(bm25_ids, vec_ids)[:5]
return [chunk_store[cid] for cid in fused_ids if cid in chunk_store]Applying dynamic user checks before merging prevents authorization leaks at retrieval time.