Reranking
Also known as: reranker, re-ranking, cross-encoder reranking, retrieval reranking
In a RAG pipeline, the first retrieval step is usually fast but approximate. You use vector similarity (comparing embeddings, which are mathematical representations of meaning) to pull the top-K most similar chunks. This catches obviously relevant documents but can miss nuance or let in lookalike-but-wrong results. Reranking is a slower, more careful second look.
A reranker takes the retrieved candidates and scores each one against the original query more carefully, usually using a more powerful model that reads both the query and the candidate together. Then it sorts the results by relevance score. You typically pass only the top 3-5 reranked chunks into the model's context window, not all 20 original candidates.
The practical impact: reranking can meaningfully improve answer quality on ambiguous or complex queries where the initial retrieval returns the right documents in the wrong order. It adds latency and cost, so it's not worth it for every use case. But if you're finding that your RAG system retrieves relevant documents but the model keeps producing wrong answers, reranking is often the next lever to pull.