Precision

Precision measures the validity of positive predictions made by a model. It answers the question: "Of all the instances the model predicted as positive, which actually were?".

Calculation

Precision is calculated as:

\(\frac{\text{true positives}}{\text{true positives} + \text{false positives}}\)

Note:

For a model returning class probabilities or confidences, precision is calculated at a specific confidence threshold.

When To Use Precision

Precision is critical when false positives are costly or problematic.

Example:

If a model predicting whether an article abstract is related to diabetes has 90% precision, this means that 90% of the articles it marks as diabetes-related actually are, while 10% are not.

Overall Precision for Multi-Class Models

For multi-class models, overall precision amongst all classes can be averaged in 3 ways:

Macro-Average

Macro-average precision is useful in cases where the user would like to treat each class equally, regardless of frequency. Precision is calculated for each class separately, and then averaged over. It is calculated as:

\(\frac{1}{n}\sum_{i=1}^n(\frac{\text{true positives}_i}{\text{true positives}_i + \text{false positives}_i})\)

where \(i\) denotes each class.

Weighted-Average

Weighted-average precision is similar to macro-average, but weights each class's precision by the number of true instances for that class. This is useful for cases where the user would like to give higher importance to classes with greater number of samples. It is calculated as:

\(\frac{1}{n}\sum_{i=1}^nw_i(\frac{\text{true positives}_i}{\text{true positives}_i + \text{false positives}_i})\)

where \(w_i\) is the number of samples of each class.

Micro-Average

Micro-averaging is useful for cases where the user would like to assign equal weight to each sample, regardless of class or class frequency. Here, total true and false positives are aggregated across all classes, and precision is calculated on the total counts. It is calculated as:

\(\frac{\sum_{i=1}^n\text{true positives}_i}{\sum_{i=1}^n(\text{true positives}_i + \text{false positives}_i)}\)