Skip to contents

Native implementation of the F-beta score (default beta = 1, the harmonic mean of precision and recall). Macro and macro-weighted forms compute the (possibly weighted) arithmetic mean of per-class F-beta scores – the convention used by yardstick and scikit-learn (Manning et al. 2008, ch. 13). This differs from Sokolova & Lapalme (2009, Table 3) where macro F-score is computed from the macro-averaged precision and recall directly; the two coincide only when per-class precision and recall are equal across classes. Micro pools TP, FP, and FN globally before computing F-beta.

Usage

metric_f_meas(
  truth,
  estimate,
  estimator = c("binary", "macro", "macro_weighted", "micro"),
  event_level = c("first", "second"),
  beta = 1
)

Arguments

truth

Factor (or coercible) of true class labels.

estimate

Factor (or coercible) of predicted class labels. Must take values from the same level set as truth.

estimator

One of "binary" (exactly two classes; uses event_level), "macro" (unweighted mean of per-class precisions), "macro_weighted" (mean weighted by truth-class prevalence), or "micro" (pooled TP and FP across all classes; for single-label multi-class data this equals accuracy).

event_level

For estimator = "binary": which level is the positive event, "first" (default) or "second".

beta

Positive numeric. beta = 1 (default) gives the familiar F1; beta < 1 weights precision more, beta > 1 weights recall more.

Value

A single numeric value.

References

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437. doi:10.1016/j.ipm.2009.03.002

Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval, Chapter 13. Cambridge University Press. (Free online: https://nlp.stanford.edu/IR-book/)