Concept·Patterns & Practices·Added 2 months ago

Context stuffing

Also known as: context loading, prompt stuffing, long-context prompting

Loading as much relevant information as possible into the model's context window upfront, rather than retrieving it dynamically. Works well when the relevant data fits and is known in advance, but can degrade quality and inflate cost at scale.

As context windows have grown from a few thousand to hundreds of thousands of tokens (a token is roughly three-quarters of a word), some builders have moved from careful retrieval-augmented architectures to just throwing everything relevant directly into the prompt. The appeal is simplicity: no retrieval step, no risk of fetching the wrong chunk, no vector database to maintain.

This is called context stuffing, and it genuinely works for many use cases. If your full codebase, knowledge base, or document set fits in the window, loading it directly is often faster to build and can produce better results than a chunked retrieval approach where the wrong pieces sometimes get fetched.

The downsides appear at scale. Very long contexts cost more per query and can cause 'lost in the middle' effects, where the model pays less attention to information buried in the center of a huge context. It also forces you to reload all that context on every request rather than indexing it once. The practical rule of thumb: context stuffing is great for prototypes and tasks where the data is small and stable; structured retrieval earns its complexity when the data is large, dynamic, or varied.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms