Skip to contents

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Usage

bacc(truth, response, sample_weights = NULL, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the same levels and length as response.

response

(factor())
Predicted response labels. Must have the same levels and length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

...

(any)
Additional arguments. Currently ignored.

Value

Performance value as numeric(1).

Details

The Balanced Accuracy computes the weighted balanced accuracy, suitable for imbalanced data sets. It is defined analogously to the definition in sklearn.

First, all sample weights \(w_i\) are normalized per class so that each class has the same influence: $$ \hat{w}_i = \frac{w_i}{\sum_{j=1}^n w_j \cdot \mathbf{1}(t_j = t_i)}. $$ The Balanced Accuracy is then calculated as $$ \frac{1}{\sum_{i=1}^n \hat{w}_i} \sum_{i=1}^n \hat{w}_i \cdot \mathbf{1}(r_i = t_i). $$ This definition is equivalent to acc() with class-balanced sample weights.

Meta Information

  • Type: "classif"

  • Range: \([0, 1]\)

  • Minimize: FALSE

  • Required prediction: response

References

Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010). “The Balanced Accuracy and Its Posterior Distribution.” In 2010 20th International Conference on Pattern Recognition. doi:10.1109/icpr.2010.764 .

Guyon I, Bennett K, Cawley G, Escalante HJ, Escalera S, Ho TK, Macia N, Ray B, Saeed M, Statnikov A, Viegas E (2015). “Design of the 2015 ChaLearn AutoML challenge.” In 2015 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn.2015.7280767 .

See also

Other Classification Measures: acc(), ce(), logloss(), mauc_aunu(), mbrier(), mcc(), zero_one()

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
bacc(truth, response)
#> [1] 0.2222222