nDCG - Normalized Discounted Cumulative Gain

What is nDCG

nDCG is a rank-aware evaluation metric that measures the quality of search results by comparing the actual ranking of documents against an ideal ranking. It accounts for both relevance and position in the ranked list.

NDCG is calculated by taking the Discounted Cumulative Gain (DCG) divided by the Ideal Discounted Cumulative Gain (IDCG).

When to use nDCG

Position-sensitive: Heavily weights top-ranked results.
Normalised: Enables comparison across different queries.
Handles graded relevance: Works with multi-level relevance judgments (not just binary). For example, the level of relevances could be 3, 2, 1, 0.
Industry standard: Widely used in information retrieval, recommendation systems, and major benchmarks like MTEB (Massive Text Embedding Benchmark)

Key Components

1. Cumulative Gain (CG)

Sum of relevance scores of all retrieved documents

\(CG = \sum_{i=1}^{k} rel_i\)

(where \(rel_i\) is the relevance of document at position \(i\))

2. Discounted Cumulative Gain (DCG)

Applies logarithmic discount to lower-ranked positions

\(DCG = \sum_{i=1}^{k} \frac{rel_i}{\log_2(i + 1)}\) Higher-ranked documents contribute more to the score

3. Ideal DCG (IDCG)

DCG of the perfect ranking (documents sorted by relevance in descending order) Represents the maximum possible DCG for a given query

4. Normalized DCG (nDCG)

\(nDCG = \frac{DCG}{IDCG}\)

Range: 0 to 1 (1 = perfect ranking)

Example Calculation

Consider a query with 5 documents ranked by a system, with relevance judgements:

Rank	Document	Relevance Score	1/log₂(rank+1)	Contribution to DCG
1	Doc A	3	1.000	3.000
2	Doc B	1	0.631	0.631
3	Doc C	2	0.500	1.000
4	Doc D	0	0.431	0.000
5	Doc E	1	0.387	0.387

DCG = 3.000 + 0.631 + 1.000 + 0.000 + 0.387 = 5.018

Ideal ranking (sorted by relevance): Doc A (3), Doc C (2), Doc B (1), Doc E (1), Doc D (0) IDCG = 3.000 + 1.262 + 0.500 + 0.387 + 0.000 = 5.149

nDCG = 5.018 / 5.149 = 0.975

This high nDCG score indicates the ranking is very close to optimal, with only Doc B and Doc C swapped from the ideal order.

nDCG@k Variant

nDCG@k (e.g., nDCG@10) evaluates only the top k documents in the ranking:

Focuses on precision at the top of the results
More practical than evaluating entire result sets

Dataset-Level Evaluation

When evaluating on a dataset with multiple queries, calculate the overall nDCG performance by:

Compute nDCG for each query individually using the query's relevance judgements
Average across all queries to get the dataset-level score:

\(\text{Dataset nDCG} = \frac{1}{|Q|} \sum_{q \in Q} \text{nDCG}_q\) , where \(Q\) is the set of all queries

Important considerations:

Each query contributes equally to the final score regardless of the number of relevant documents
Queries with no relevant documents (IDCG = 0) are typically excluded from the average
Report median nDCG to handle score distribution.

Interpretation

nDCG = 1.0 represents perfect ranking where all relevant documents are ranked in descending order of their relevance scores
nDCG scores cannot be directly compared across different datasets. Dataset characteristics affect achievable scores: number of relevant documents, relevance distribution, and query difficulty.
It is suggested to use basic retrieval methods (e.g., BM25, TF-IDF) as the baselines to understand the difficulty of datasets.

Reference

https://docs.cohere.com/docs/rerank-understanding-the-results#ndcg10