RAG
5 / 7
RAG 3 min

Chunking Strategies That Actually Work

Fixed-size vs semantic chunking, overlap windows, and when to use each one.

Chunking is the process of breaking a single long document into smaller segments before embedding or indexing. If chunks are too small, they lack the surrounding context the LLM needs to synthesize an answer. If they are too large, they dilute the semantic vector embedding with irrelevant noise and consume too many tokens in the prompt.

Cutting a movie into trailer clips

Imagine cutting a two-hour movie into short clips for a trailer. Cut each clip too short (2 seconds) and none of them make sense on their own, you lose all context. Cut each clip too long (20 minutes) and you might as well have shown the whole movie, the clip buries the one moment that mattered inside a pile of irrelevant footage. Good trailer editors also overlap their cuts slightly, so a clip doesn't start mid-sentence and lose the audience. Chunking a document works the same way. The chunk size is how long each clip is. The overlap is the few extra seconds of footage editors leave on both ends so no cut lands mid-thought.

Interactive: Chunking Playground
RAG Ingestion

Drag the sliders to see how chunk size and overlap slice the same paragraph differently. Chunks highlighted at the start show the text they share with the chunk before them.

Chunks generated5
Avg chunk length201 chars
Storage overhead from overlap16%
Chunk 1220 chars

Retrieval-Augmented Generation gives a language model access to knowledge it was never trained on. Instead of trusting the model's frozen memory, the system retrieves relevant text at query time and hands it to the model

40 shared chars
Chunk 2220 chars

at query time and hands it to the model as evidence. This only works if the source documents are split into good chunks first. Cut a chunk too small and it loses the surrounding context a reader would need to make sense

40 shared chars
Chunk 3220 chars

ontext a reader would need to make sense of it. Cut it too large and the one useful sentence gets buried inside a wall of unrelated text, diluting the embedding and wasting tokens. A small overlap between consecutive chu

40 shared chars
Chunk 4220 chars

A small overlap between consecutive chunks acts like a safety net, making sure an idea that spans a chunk boundary is not sliced clean in half. Getting chunk size and overlap right is one of the highest-leverage tuning

40 shared chars
Chunk 5126 chars

t is one of the highest-leverage tuning decisions in a RAG pipeline, often mattering more than which vector database you pick.

Advanced chunking techniques include layout-aware splitting (splitting at natural markdown headers or paragraph boundaries), semantic splitting (evaluating cosine distance between adjacent sentences), and parent-child retrieval (indexing tiny chunks for precision search, but returning a larger parent window to the LLM).

Our Project Chunk Configurations
Our sibling projects implement different chunking parameters based on their search engines:

* Semantic Vector Bot: Uses a chunk size of `1500` characters with a `150` character overlap. Larger chunks preserve context for semantic embedding models.
* Smart File Cabinet (Enterprise Search): Uses a chunk size of `900` characters with a `100` character overlap. Smaller chunks are highly effective for hybrid search, as they keep BM25 keyword matching dense and targeted.
python
def chunk_text(text: str, size: int = 900, overlap: int = 100) -> list[str]:
    """Split text into overlapping character-level chunks."""
    chunks, start = [], 0
    while start < len(text):
        chunks.append(text[start : start + size])
        start += size - overlap
    return [c.strip() for c in chunks if c.strip()]

Choosing the right strategy is an iterative process depending on your document structures. Standard character-level splitting with overlaps is a solid baseline, but production layouts benefit from markdown parsing structure.

What's next
We now have well-formed chunks indexed two ways: by keyword (BM25) and by meaning (vectors). The obvious next question: why choose one when you can run both and combine the results? That's Hybrid Search: Dense and Sparse Together, next.