Recall

The recall metric answers the question "Out of all the actual positive cases, how many did the model correctly identify?".

Calculation

Recall is calculated as:

\(\frac{\text{true positives}}{\text{true positives} + \text{false negatives}}\)

Note:

For a model returning class probabilities or confidences, recall is calculated at a specific confidence threshold.

When To Use Recall

Recall is critical when false negatives are costly or problematic.

Example:

If a model used for cancer screening has 97% recall, this means it will correctly label 97% of actual cancer examples as worth investigating.

Overall Recall for Multi-Class Models

For multi-class models, overall recall amongst all classes can be averaged in 3 ways:

Macro-Average

Macro-average recall is useful in cases where the user would like to treat each class equally, regardless of frequency. recall is calculated for each class separately, and then averaged over. It is calculated as:

\(\frac{1}{n}\sum_{i=1}^n(\frac{\text{true positives}_i}{\text{true positives}_i + \text{false negatives}_i})\)

where \(i\) denotes each class.

Weighted-Average

Weighted-average recall is similar to macro-average, but weights each class's recall by the number of true instances for that class. This is useful for cases where the user would like to give higher importance to classes with greater number of samples. It is calculated as:

\(\frac{1}{n}\sum_{i=1}^nw_i(\frac{\text{true positives}_i}{\text{true positives}_i + \text{false negatives}_i})\)

where \(w_i\) is the number of samples of each class.

Micro-Average

Micro-averaging is useful for cases where the user would like to assign equal weight to each sample, regardless of class or class frequency. Here, total true and false negatives are aggregated across all classes, and recall is calculated on the total counts. It is calculated as:

\(\frac{\sum_{i=1}^n\text{true positives}_i}{\sum_{i=1}^n(\text{true positives}_i + \text{false negatives}_i)}\)