Concept·Patterns & Practices·Added 1 month ago

Chunking

Also known as: text chunking, document chunking, chunk strategy, chunk size

The process of splitting a large document into smaller pieces before storing them for AI retrieval. How you chunk determines what the model can find. Cut too large, and retrieved chunks are unfocused. Cut too small, and they lose context. Chunking is a surprisingly high-leverage decision in any RAG system.

When you build a RAG (Retrieval-Augmented Generation) system, you first need to store your documents in a way that lets the AI quickly find relevant sections. You can't just throw entire 100-page PDFs into a database lookup. You break them into chunks, usually a few hundred tokens each (a token is roughly a word or word-fragment), and store those chunks. When a user asks a question, the system retrieves the most relevant chunks and feeds them to the model.

The naive approach is fixed-size chunking: split every N tokens, regardless of content. This is fast but destructive: it splits sentences mid-thought, severs related paragraphs, and loses document structure. Semantic chunking tries to split at natural boundaries like paragraph breaks or topic shifts. Hierarchical chunking stores both small chunks (for precision) and larger surrounding context (for coherence) and retrieves both together.

Why chunking matters: if your chunks are wrong, no model upgrade will fix the underlying retrieval quality. A poorly chunked document means the right answer exists in your system but the relevant passage never makes it into the model's context. Builders describe fixing chunking as one of the highest-leverage improvements they can make to a RAG pipeline. It's a product decision masquerading as a configuration setting.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms