Chunking
Also known as: text chunking, document chunking, chunk strategy, chunk size
When you build a RAG (Retrieval-Augmented Generation) system, you first need to store your documents in a way that lets the AI quickly find relevant sections. You can't just throw entire 100-page PDFs into a database lookup. You break them into chunks, usually a few hundred tokens each (a token is roughly a word or word-fragment), and store those chunks. When a user asks a question, the system retrieves the most relevant chunks and feeds them to the model.
The naive approach is fixed-size chunking: split every N tokens, regardless of content. This is fast but destructive: it splits sentences mid-thought, severs related paragraphs, and loses document structure. Semantic chunking tries to split at natural boundaries like paragraph breaks or topic shifts. Hierarchical chunking stores both small chunks (for precision) and larger surrounding context (for coherence) and retrieves both together.
Why chunking matters: if your chunks are wrong, no model upgrade will fix the underlying retrieval quality. A poorly chunked document means the right answer exists in your system but the relevant passage never makes it into the model's context. Builders describe fixing chunking as one of the highest-leverage improvements they can make to a RAG pipeline. It's a product decision masquerading as a configuration setting.