Fleiss' kappa for many raters — reliability_kappa

Native implementation of Fleiss' generalisation of kappa to a constant number of raters per subject (Fleiss, 1971), where the raters rating one subject need not be the same as those rating another. For two raters use reliability_kappa() (Cohen's): the two coefficients differ even on the same data because Cohen's uses each rater's marginals while Fleiss' uses pooled marginals.

Usage

reliability_kappa_fleiss(observations)

Arguments

observations: A subjects x raters matrix or data.frame. Rows are units; columns are raters. Must not contain NA. The number of raters per subject is taken to be ncol(observations).

Value

A list with elements method, value, ci_lower, ci_upper, per_value, n_observers, n_units, n_pairable. CI bounds are from the asymptotic SE in Fleiss (1971, Eq. 16). per_value gives per-category kappa_j from Fleiss (1971, Eqs. 20-21).

References

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378-382. doi:10.1037/h0031619