Reranking
Reranking is a critical information retrieval task that evaluates a model's ability to reorder an initial candidate set of documents based on their relevance to a query. In this two-stage retrieval architecture, the reranker performs a fine-grained analysis of the candidates initially retrieved by a first-stage retriever (such as BM25 or dense retrieval models).
Applications
Reranking is commonly used in:
- Search engine optimization - Documents are reranked using cross-attention between query-document pairs
- Question-answering systems - Passage reranking improves answer selection accuracy
- E-commerce product search - Items are reranked based on user preferences and query intent
Process Overview
graph TD
A[Query] --> C[Cross-Encoder Model]
B[Candidate Documents] --> C
C --> D[Relevance Scores]
D --> E[Reranked Documents]
E --> F[Evaluation Metrics]
A -.->|Log| G[Langfuse:<br/>Ingest new traces]
E -.->|Log| G
Core Components
1. Input Requirements
- Query: Original search query text
- Candidate Documents: Initial set of documents from first-stage retrieval
- Typically top-k documents (e.g., top 100) from retrieval step
- Reduces computational overhead by reranking only promising candidates
2. Model Architecture
Cross-Encoder Models directly compare query-document pairs through the following architectures:
- MS-MARCO fine-tuned cross-encoders - BERT/RoBERTa models fine-tuned on MS-MARCO dataset for passage ranking
- Domain-specific fine-tuned BERT models - Custom models trained on specific domains
- Custom cross-encoder architectures - Purpose-built models for specific use cases
3. Ranking Process
- Encode Pairs: Process each query-document pair through cross-encoder
- Score Generation: Generate relevance scores for each pair
- Reorder: Sort documents based on cross-encoder scores
- Output: Return reranked document list
Reranking is more computationally intensive than first-stage retrieval but typically more accurate. This higher computational cost occurs because:
- Real-time pair processing: Cross-encoders must process query and document pairs together at inference time, preventing pre-computation of embeddings
- Linear scaling overhead: Each query requires separate processing with every candidate document, resulting in O(n) cross-attention operations
- No caching benefits: Unlike first-stage retrieval where document embeddings can be pre-computed and indexed, reranking scores must be calculated at query time
- Complex attention mechanisms: Cross-attention requires more computational resources than the simple vector similarity calculations used in first-stage retrieval
Evaluation Metrics
Reranking tasks use several key metrics to evaluate the quality of document reordering. For detailed explanations of all information retrieval metrics, see our comprehensive metrics guide.
Data Schema
| Column | Type | Description | Required |
|---|---|---|---|
query |
string |
The question/query text | ✓ |
positive |
list[string] |
List of relevant/positive documents | ✓ |
negative |
list[string] |
List of non-relevant/negative documents | ✓ |
Example Data Format
reranking_data = {
"query": "What is the capital of France?",
"positive": [
"Paris is the capital city of France.",
"The capital of France is Paris, located in the north-central part of the country."
],
"negative": [
"London is the capital of England.",
"Berlin is the capital city of Germany."
]
}
Supported Models
Two types of reranking models are supported:
-
HuggingFace embedding-based models - Only models compatible with SentenceTransformer (including SentenceTransformer, BERT, RoBERTa) are supported for embedding-based reranking.
-
API-based reranking services - Reranking model like Cohere Rerank are supported through the API integration. These can be configured by setting
reranking_method: "api"in the config file.