Context stuffing
Also known as: context loading, prompt stuffing, long-context prompting
As context windows have grown from a few thousand to hundreds of thousands of tokens (a token is roughly three-quarters of a word), some builders have moved from careful retrieval-augmented architectures to just throwing everything relevant directly into the prompt. The appeal is simplicity: no retrieval step, no risk of fetching the wrong chunk, no vector database to maintain.
This is called context stuffing, and it genuinely works for many use cases. If your full codebase, knowledge base, or document set fits in the window, loading it directly is often faster to build and can produce better results than a chunked retrieval approach where the wrong pieces sometimes get fetched.
The downsides appear at scale. Very long contexts cost more per query and can cause 'lost in the middle' effects, where the model pays less attention to information buried in the center of a huge context. It also forces you to reload all that context on every request rather than indexing it once. The practical rule of thumb: context stuffing is great for prototypes and tasks where the data is small and stable; structured retrieval earns its complexity when the data is large, dynamic, or varied.