Skip to content

MRR - Mean Reciprocal Rank

What is MRR

Mean Reciprocal Rank (MRR) is a rank-aware relevance evaluation metric that measures how well a system ranks the first relevant document. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer: 1 for first place, 1/2 for second place, and 1/n for the nth place. It focuses specifically on the position of the highest-ranked relevant item. Note that only the rank of the first relevant answer is considered, and possible further relevant answers are ignored

When to use MRR

  • First-hit and Ranking focused: Evaluate how well a system ranks results, with particular emphasis on finding the first relevant result as high as possible.
  • Question answering: For scenarios where users typically need only one correct answer. The metric assumes that once a user finds the first relevant item, their search task is complete.
  • Binary relevance: when the labels of data are either relevant or not relevant, with no intermediate levels of relevance

Key Components

1. Reciprocal Rank (RR)

For a single query, the reciprocal rank is the inverse of the position of the first relevant document:

\(RR = \frac{1}{\text{rank of first relevant document}}\)

  • If the first relevant document is at rank 1: RR = 1.0
  • If the first relevant document is at rank 3: RR = 1/3 = 0.333
  • If no relevant documents are found: RR = 0.0

2. Mean Reciprocal Rank (MRR)

When evaluating on a dataset with multiple queries:

The average reciprocal rank across all queries in a dataset:

\(MRR = \frac{1}{|Q|} \sum_{q=1}^{|Q|} RR_q\)

where \(|Q|\) is the total number of queries and \(RR_q\) is the reciprocal rank for query \(q\).

Range: 0 to 1 (1 = perfect performance where all first relevant documents are ranked at position 1)

Important considerations:

  • Each query contributes equally regardless of how many relevant documents it has
  • Queries with no relevant documents contribute 0 to the average
  • MRR treats all relevant documents beyond the first as having no additional value

Example Calculation

Consider a dataset with 4 queries and their search results:

Query ID Query Text Ranked Results Relevance Labels First Relevant Position
Q1 python [R1,R2,R3,R4] [0,1,0,1] 2 0.500
Q2 ML [R5,R6,R7,R8] [1,0,1,0] 1 1.000
Q3 data [R9,R10,R11] [0,0,1] 3 0.333
Q4 AI [R1,R2,R8, R12] [0,0,0,0] No relevant found 0

MRR = (0.500 + 1.000 + 0.333 + 0.000) / 4 = 0.458

This MRR score indicates that, on average, the first relevant document appears around position 2.18 (1/0.458) in the rankings.

MRR@k Variant

MRR@k (e.g., MRR@10) only considers the top k documents in the ranking:

  • If the first relevant document appears beyond rank k, the reciprocal rank is 0
  • More practical for evaluating systems with large result sets
  • Focuses evaluation on the most visible results to users

Example with MRR@3:

  • Query with first relevant document at rank 5: RR = 0 (beyond top 3)
  • Query with first relevant document at rank 2: RR = 0.5 (within top 3)

Interpretation

  • MRR = 1.0 represents perfect performance where every query's first relevant document is ranked at position 1
  • MRR = 0.5 indicates that, on average, the first relevant document appears at rank 2
  • MRR = 0.0 means no relevant documents were found for any query

MRR scores are directly interpretable since 1/MRR gives the average rank of the first relevant document. However, MRR scores cannot be directly compared across different datasets due to varying query difficulty and relevance patterns.

Best Practices

  • Use established baselines (e.g., BM25, random ranking) to understand the relative difficulty of datasets when reporting MRR scores
  • Consider MRR@k variants for practical evaluation scenarios

Reference

https://en.wikipedia.org/wiki/Mean_reciprocal_rank

https://docs.cohere.com/docs/rerank-understanding-the-results#mrr10