Reranking

Reranking is a critical information retrieval task that evaluates a model's ability to reorder an initial candidate set of documents based on their relevance to a query. In this two-stage retrieval architecture, the reranker performs a fine-grained analysis of the candidates initially retrieved by a first-stage retriever (such as BM25 or dense retrieval models).

Applications

Reranking is commonly used in:

Search engine optimization - Documents are reranked using cross-attention between query-document pairs
Question-answering systems - Passage reranking improves answer selection accuracy
E-commerce product search - Items are reranked based on user preferences and query intent

Process Overview

graph TD
    A[Query] --> C[Cross-Encoder Model]
    B[Candidate Documents] --> C
    C --> D[Relevance Scores]
    D --> E[Reranked Documents]
    E --> F[Evaluation Metrics]

    A -.->|Log| G[Langfuse:<br/>Ingest new traces]
    E -.->|Log| G

Core Components

1. Input Requirements

Query: Original search query text
Candidate Documents: Initial set of documents from first-stage retrieval
Typically top-k documents (e.g., top 100) from retrieval step
Reduces computational overhead by reranking only promising candidates

2. Model Architecture

Cross-Encoder Models directly compare query-document pairs through the following architectures:

MS-MARCO fine-tuned cross-encoders - BERT/RoBERTa models fine-tuned on MS-MARCO dataset for passage ranking
Domain-specific fine-tuned BERT models - Custom models trained on specific domains
Custom cross-encoder architectures - Purpose-built models for specific use cases

3. Ranking Process

Encode Pairs: Process each query-document pair through cross-encoder
Score Generation: Generate relevance scores for each pair
Reorder: Sort documents based on cross-encoder scores
Output: Return reranked document list

Reranking is more computationally intensive than first-stage retrieval but typically more accurate. This higher computational cost occurs because:

Real-time pair processing: Cross-encoders must process query and document pairs together at inference time, preventing pre-computation of embeddings
Linear scaling overhead: Each query requires separate processing with every candidate document, resulting in O(n) cross-attention operations
No caching benefits: Unlike first-stage retrieval where document embeddings can be pre-computed and indexed, reranking scores must be calculated at query time
Complex attention mechanisms: Cross-attention requires more computational resources than the simple vector similarity calculations used in first-stage retrieval

Evaluation Metrics

Reranking tasks use several key metrics to evaluate the quality of document reordering. For detailed explanations of all information retrieval metrics, see our comprehensive metrics guide.

Data Schema

Column	Type	Description	Required
`query`	`string`	The question/query text	✓
`positive`	`list[string]`	List of relevant/positive documents	✓
`negative`	`list[string]`	List of non-relevant/negative documents	✓

Example Data Format

reranking_data = {
    "query": "What is the capital of France?",
    "positive": [
        "Paris is the capital city of France.",
        "The capital of France is Paris, located in the north-central part of the country."
    ],
    "negative": [
        "London is the capital of England.",
        "Berlin is the capital city of Germany."
    ]
}

Supported Models

Two types of reranking models are supported:

HuggingFace embedding-based models - Only models compatible with SentenceTransformer (including SentenceTransformer, BERT, RoBERTa) are supported for embedding-based reranking.
API-based reranking services - Reranking model like Cohere Rerank are supported through the API integration. These can be configured by setting reranking_method: "api" in the config file.