Concept·Patterns & Practices·Added 1 month ago

Reranking

Also known as: reranker, re-ranking, cross-encoder reranking, retrieval reranking

A second-pass filter applied after an initial document retrieval: you fetch a broad set of candidates, then score and reorder them based on how genuinely relevant each one is to the query. Retrieval gets you candidates; reranking decides which ones deserve to be in the model's context window.

In a RAG pipeline, the first retrieval step is usually fast but approximate. You use vector similarity (comparing embeddings, which are mathematical representations of meaning) to pull the top-K most similar chunks. This catches obviously relevant documents but can miss nuance or let in lookalike-but-wrong results. Reranking is a slower, more careful second look.

A reranker takes the retrieved candidates and scores each one against the original query more carefully, usually using a more powerful model that reads both the query and the candidate together. Then it sorts the results by relevance score. You typically pass only the top 3-5 reranked chunks into the model's context window, not all 20 original candidates.

The practical impact: reranking can meaningfully improve answer quality on ambiguous or complex queries where the initial retrieval returns the right documents in the wrong order. It adds latency and cost, so it's not worth it for every use case. But if you're finding that your RAG system retrieves relevant documents but the model keeps producing wrong answers, reranking is often the next lever to pull.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms