Intraclass correlation coefficient — reliability

Native implementation of the intraclass correlation coefficient (ICC) family for a subjects x raters matrix of interval/ratio ratings. Six forms are exposed via model/type/unit, following the Shrout-Fleiss naming and the McGraw-Wong calculation tables:

Usage

reliability_icc(
  ratings,
  model = c("oneway", "twoway"),
  type = c("consistency", "agreement"),
  unit = c("single", "average"),
  r0 = 0,
  conf.level = 0.95
)

Arguments

ratings: A subjects x raters matrix or data.frame of numeric ratings. Rows are objects of measurement (subjects); columns are raters. Must not contain NA.
model: "oneway" (each subject rated by a different random set of raters) or "twoway" (the same k raters rate every subject).
type: "consistency" (column variance excluded – relative agreement) or "agreement" (column variance included – absolute agreement). Ignored for model = "oneway".
unit: "single" (reliability of one rater's score) or "average" (reliability of the mean across k raters; the Spearman-Brown stepped-up form).
r0: Null-hypothesis value for the F-test. Default 0 tests H0: ICC = 0.
conf.level: Confidence level for the CI on the population ICC (default 0.95).

Value

A list with elements:

method: Short label, e.g. "icc_2_1" or "icc_3_k".
value: The ICC estimate.
ci_lower, ci_upper: Confidence interval bounds at conf.level.
per_value: NULL (ICC has no per-category breakdown).
n_observers, n_units, n_pairable: Counts (k, n, k*n).
model, type, unit: The configuration that produced the ICC.
icc_name: Canonical Shrout-Fleiss name, e.g. "ICC(2,1)".
F_value, df1, df2, p_value, r0: F-test of H0: ICC = r0.

Details

`model`	`type`	`unit`	Shrout & Fleiss	McGraw & Wong
oneway	(n/a)	single	ICC(1,1)	ICC(1)
oneway	(n/a)	average	ICC(1,k)	ICC(k)
twoway	consistency	single	ICC(3,1)	ICC(C,1)
twoway	consistency	average	ICC(3,k)	ICC(C,k)
twoway	agreement	single	ICC(2,1)	ICC(A,1)
twoway	agreement	average	ICC(2,k)	ICC(A,k)

For model = "oneway" the type argument is ignored (only one form exists). The two-way random and two-way mixed models share the same calculations; they differ only in interpretation (whether the column factor levels are treated as a random sample or as fixed). See Koo & Li (2016) for guidance on selecting a form.

References

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428. doi:10.1037/0033-2909.86.2.420

McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30-46. doi:10.1037/1082-989X.1.1.30

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163. doi:10.1016/j.jcm.2016.02.012