
Changelog
quallmer 0.4.0.9000 (development version)
Bug fixes
-
qlm_validate(..., average = "none")was reporting per-class precision and recall swapped: the helper that derived FP and FN from the confusion matrix had its row and column sums transposed relative to the orientation produced byyardstick::conf_mat(). Macro-averaged precision/recall (computed viayardstickdirectly) were correct; only the per-class breakdown was affected.
New features
qlm_compare()now reports per-category Krippendorff’s alpha (alpha_per_value[k]) for nominal data, alongside the existing overall alpha. Each category is dichotomised against all others; the marginal countnis reported in thedocidcolumn. Parallels the per-value reporting already provided for unitizing alpha (#112).qlm_compare()now reports per-category kappa (kappa_per_value[k]) for nominal data: Cohen’s κ via dichotomise-and-recompute for two raters, Fleiss’ formula (1971, Eqs. 20-21) for three or more (#112).
Internal changes
-
All reliability statistics are now native R implementations, derived directly from their source papers; the package no longer depends on
irr. Each function returns a uniform list shape (method,value,ci_lower/ci_upper,per_value,n_observers,n_units,n_pairable) plus measure-specific fields (#112):-
reliability_alpha()— Krippendorff (2019, Ch. 12) for predefined units; nominal/ordinal/interval/ratio metrics; per-category α for nominal data; verified against book worked examples §12.3.1, §12.3.4.1, §12.3.4.4. -
reliability_alpha_u()— Krippendorff’s α for unitizing continua; one call returns all variants (valuefor_u_α,binaryfor|_u_α,cu_nominalfor_cu_α, plusper_value). -
reliability_kappa()— Cohen (1960) with unweighted, linear, and quadratic weighted variants; analytic SE/CI for unweighted; per-category κ via dichotomisation. -
reliability_kappa_fleiss()— Fleiss (1971) for many raters with analytic SE and per-category κⱼ. -
reliability_kendall_w()— Kendall & Smith (1939) with automatic tie correction; verified against Kendall & Gibbons (1990) Ch. 6. -
reliability_icc()— all six ICC forms (Shrout & Fleiss 1979; McGraw & Wong 1996); verified against Shrout & Fleiss Table 4.
-
qlm_compare()standardises onsubjects × ratersmatrix input internally, removing the transpose step previously needed forirr::kripp.alpha.qlm_validate()no longer relies onyardstick. Accuracy, MAE, RMSE, and the confusion matrix are computed inline from base R; multi-class precision, recall, and F-measure are now provided by internalmetric_precision(),metric_recall(), andmetric_f_meas()supporting all four standard estimators (binary,macro,macro_weighted,micro). Confusion matrix, micro and macro precision/recall follow Sokolova & Lapalme (2009), Tables 1-3; macro F-measure is the arithmetic mean of per-class F-scores (Manning, Raghavan & Schütze 2008, ch. 13), matching the yardstick / scikit-learn convention. Output verified identical toyardstick’s on both the binary case and a 4-class noisy multi-class example.yardstickremoved fromImports.
quallmer 0.4.0
CRAN release: 2026-05-06
New features
New
qlm_segment()segments a corpus into thematic or conceptual units using an LLM, returning a quanteda corpus analogous toquanteda::corpus_segment()output. Schema fields become docvars;docid_andsegid_track provenance. Enables aspect-based sentiment analysis, thematic coding, and other applications requiring variable-length segmentation (#96).-
qlm_compare()now supports inter-coder reliability for segmentation tasks. When all inputs are segmented corpora produced byqlm_segment(), it automatically computes Krippendorff’s alpha for unitizing (Krippendorff, 2019, section 12.6), an extension of alpha designed for variable-length text segmentation. Three measures are reported (marked experimental):-
u_alpha_nominalandu_alpha_binarymeasure joint boundary and coding reliability across the full segmented continuum. -
cu_alpha_nominalmeasures coding reliability conditional on unitization, isolating coding disagreement from boundary disagreement. - Per-value
(k)u_alpha_nominalreports reliability and coverage for each individual code, enabling diagnosis of which codes are applied consistently. Results include both per-document and overall (concatenated continuum) alpha.
-
as_qlm_coded()gainsqlm_segmentandsource_textarguments for converting gold-standard data frames to segmented corpora with character positions, enabling ICR comparison of LLM segmentation against human-coded reference data.qlm_segment()now accepts anameargument stored in corpus metadata for rater identification when comparing multiple segmenters viaqlm_compare().
quallmer 0.3.0
CRAN release: 2026-02-16
CRAN submission
- Expanded DESCRIPTION with supported LLM providers, method details, and DOI references.
- Added
\valuedocumentation to all exported methods. - Fixed HTML validation issue in
qlm_validate()documentation.
Internal changes
- Refactored corpus methods to use
qlm_corpuswrapper class pattern instead of conditionalregisterS3method(), eliminating load-order dependencies and runtime checks (#86).
Accessor functions
- New
qlm_meta()accessor function provides stratified access to metadata forqlm_coded,qlm_codebook,qlm_comparison, andqlm_validationobjects. Metadata is organized into three types following the quanteda convention:-
type = "user"(default): User-specified fields (name,notes) that can be modified viaqlm_meta<-(). -
type = "object": Read-only parameters set at creation time (batch,call,chat_args,execution_args,parent,n_units,input_type). -
type = "system": Read-only environment information (timestamp,ellmer_version,quallmer_version,R_version).
-
- New
qlm_meta<-()replacement function allows modifying user metadata fields only. Attempting to modify object or system metadata produces an informative error (#72). - New
codebook()extractor retrieves the codebook component fromqlm_coded,qlm_comparison, andqlm_validationobjects. This is a core component accessor analogous toformula()forlmobjects (#72). - New
inputs()extractor retrieves the original input data (texts or image paths) fromqlm_codedobjects. The function name mirrors theinputsargument inqlm_code()(#72). - These accessor functions replace direct
attr(x, "run")$...access, providing a stable API for extracting and modifying object metadata and components.
Build system
- Build system: pkgdown articles now built locally via Makefile to enable caching and avoid API key requirements in CI (#68).
Gold standard handling and validation improvements
- New
as_qlm_coded()function replacesqlm_humancoded()as the primary function for converting human-coded or external data toqlm_codedobjects. The new function includes anis_goldparameter to mark gold standard objects for automatic detection. -
as_qlm_coded()now supports quanteda corpus objects directly via S3 method dispatch. Document variables (docvars) are automatically converted to coded variables, with document names used as identifiers by default. This simplifies the workflow for corpus-based gold standards (#81). -
qlm_validate()now auto-detects gold standards marked withas_qlm_coded(data, is_gold = TRUE), making thegold =parameter optional when using marked objects. Explicitgold =still works for backward compatibility. -
qlm_validate()signature changed toqlm_validate(..., gold, by, ...)to support validating multiple coded objects against a single gold standard in one call. Results include aratercolumn identifying each object. -
qlm_humancoded()is now marked@keywords internalbut remains exported for backward compatibility. New code should useas_qlm_coded(). - Gold standard objects display
# Gold: Yesin their print output for easy identification. - Improved error messages in
qlm_validate()detect common mistakes like forgettinggold =or misspelling parameter names, with helpful suggestions for correction.
Confidence intervals and reliability metrics
-
ciparameter added toqlm_compare()andqlm_validate()with options"none"(default),"analytic", or"bootstrap". - Bootstrap confidence intervals now work for all metrics in both functions via percentile method with configurable
bootstrap_nparameter (default 1000). - Analytic confidence intervals available for ICC (via psych package) and Pearson’s r (via cor.test).
- Results include
ci_lowerandci_uppercolumns whenci != "none".
Rater identification and combinability
-
qlm_compare()results now includerater1,rater2,rater3, etc. columns containing the names of compared objects (fromnameattribute), enabling easy identification when combining multiple comparisons withdplyr::bind_rows(). -
qlm_validate()results now include aratercolumn identifying which object is being validated, enabling easy combining of multiple validations. - Both functions return data frames (class
qlm_comparisonandqlm_validation) instead of lists, making them easier to filter, combine, and analyze. - Results from multiple
qlm_compare()orqlm_validate()calls can be combined withbind_rows()for analysis across multiple coders or conditions.
API refinements
-
qlm_code()defaultnameparameter changed from"original"toNULLfor cleaner output when names aren’t specified. - Auto-conversion messages now recommend
as_qlm_coded()instead ofqlm_humancoded().
The quallmer audit trail
- New
notesparameter inqlm_code(),qlm_replicate(), andas_qlm_coded()for documenting the rationale behind each coding run. Notes are displayed in print output and captured inqlm_trail(). - The trail API has been simplified to a single function following Lincoln and Guba’s (1985) audit trail concept for establishing trustworthiness in qualitative research.
-
qlm_trail()now accepts an optionalpathargument. When provided, saves RDS archive and generates Quarto report with full audit trail documentation. - The Quarto report includes all Lincoln and Guba audit trail components: instrument development (codebooks), process notes (run parameters and timeline), data reconstruction (comparisons and validations), and raw data summary.
- New replication section in generated reports provides environment setup instructions, API credential configuration, and executable R code to replicate each coding run.
- Removed helper functions:
qlm_trail_save(),qlm_trail_export(),qlm_trail_report(), andqlm_archive(). Useqlm_trail(..., path = "filename")instead. -
qlm_trail()now generates fallback names for objects with missingnameattribute.
quallmer 0.2.0
The quallmer audit trail
- New
qlm_trail()function creates complete audit trails following Lincoln and Guba’s (1985) concept for establishing trustworthiness in qualitative research. - Use
qlm_trail(..., path = "filename")to save RDS archive and generate Quarto report. - Trail print output shows summaries of comparisons and validations (level, subjects, raters, etc.) for better visibility into workflow assessment steps.
- All
qlm_comparisonandqlm_validationobjects include run attributes capturing parent relationships, enabling full workflow traceability. - Audit trail automatically captures branching workflows when multiple coded objects are compared or validated.
New API
The package introduces a new qlm_*() API with richer return objects and clearer terminology for qualitative researchers:
-
qlm_codebook()defines coding instructions, replacingtask()(#27). -
qlm_code()executes coding tasks and returns a tibble with coded results and metadata as attributes, replacingannotate()(#27). The returnedqlm_codedobject prints as a tibble and can be used directly in data manipulation workflows. Now includesnameparameter for tracking runs and hierarchical attribute structure with provenance support. -
qlm_compare()compares multipleqlm_codedobjects to assess inter-rater reliability. Automatically computes all statistically appropriate measures from the irr package based on the specified measurement level (nominal, ordinal, or interval). -
qlm_validate()validates aqlm_codedobject against a gold standard (human-coded reference data). Automatically computes all statistically appropriate metrics based on the specified measurement level, using measures from the yardstick, irr, and stats packages. For nominal data, supports multiple averaging methods (macro, micro, weighted, or per-class breakdown). -
qlm_replicate()re-executes coding with optional overrides (model, codebook, parameters) while tracking provenance chain. Enables systematic assessment of coding reliability and sensitivity to model choices.
The new API uses the qlm_ prefix to avoid namespace conflicts (e.g., with ggplot2::annotate()) and follows the convention of verbs for workflow actions, nouns for accessor functions.
Restructured qlm_coded objects
-
qlm_codedobjects now use a hierarchical attribute structure with arunlist containingname,batch,call,codebook,chat_args,execution_args,metadata, andparentfields. This structure supports provenance tracking across replication chains and provides clearer organization of coding metadata (#26).- The
batchflag indicates whether batch processing was used. -
execution_argsreplacespcs_argsand stores all non-chat execution arguments for both parallel and batch processing. Old objects withpcs_argsremain compatible.
- The
Example codebooks
- New example codebook data object
data_codebook_sentimentprovides a ready-to-use codebook for sentiment analysis. - All predefined
task_*()functions are deprecated in favor of using the data objects or creating custom codebooks withqlm_codebook().
Deprecated and superseded functions
-
task()is deprecated in favor ofqlm_codebook()(#27). -
annotate()is deprecated in favor ofqlm_code()(#27). -
validate()is superseded byqlm_compare()(for inter-rater reliability) andqlm_validate()(for gold standard validation). The function remains available but is marked with a lifecycle badge. - Trail functions (
trail_settings(),trail_record(),trail_compare(),trail_matrix(),trail_icr()) are deprecated. Useqlm_code()with model and temperature parameters directly, orqlm_replicate()for systematic comparisons across models.
Backward compatibility: Old code continues to work with deprecation warnings. New qlm_codebook objects work with old annotate(), and old task objects work with new qlm_code(). This is achieved through dual-class inheritance where qlm_codebook inherits from both "qlm_codebook" and "task".
Package restructuring
-
validate_app()has been extracted into the companion package quallmer.app. This reduces dependencies in the core quallmer package (removing shiny, bslib, and htmltools from Imports). Install quallmer.app separately for interactive validation functionality.
Other changes
-
qlm_validate()now uses distinct, statistically appropriate metrics for each measurement level:-
Nominal (
level = "nominal"): accuracy, precision, recall, F1-score, Cohen’s kappa (unweighted) -
Ordinal (
level = "ordinal"): Spearman’s rho, Kendall’s tau, MAE (mean absolute error) -
Interval/Ratio (
level = "interval"): ICC (intraclass correlation), Pearson’s r, MAE, RMSE (root mean squared error)
The
measureargument has been removed entirely - all appropriate measures are now computed automatically based on thelevelparameter. Function signature changed:levelnow comes beforeaverage, andaverageonly applies to nominal (multiclass) data. Return values renamed for consistency:spearman→rho,kendall→tau,pearson→r. Print output uses “levels” terminology for ordinal data and “classes” for nominal data. This change provides more statistically sound validation that respects the mathematical properties of each measurement scale. -
Nominal (
-
qlm_compare()now computes all statistically appropriate measures for each measurement level:-
Nominal (
level = "nominal"): Krippendorff’s alpha (nominal), Cohen’s/Fleiss’ kappa, percent agreement -
Ordinal (
level = "ordinal"): Krippendorff’s alpha (ordinal), weighted kappa (2 raters only), Kendall’s W, Spearman’s rho, percent agreement -
Interval/Ratio (
level = "interval"): Krippendorff’s alpha (interval), ICC (intraclass correlation), Pearson’s r, percent agreement
The
measureargument has been removed entirely - all appropriate measures are now computed automatically and returned in the result object. The return structure changed from a single value to a list containing all computed measures for the specified level. Percent agreement is now computed for all levels; for ordinal/interval/ratio data, thetoleranceparameter controls what counts as agreement (e.g.,tolerance = 1means values within 1 unit are considered in agreement). -
Nominal (
New
qlm_humancoded()function converts human-coded data frames intoqlm_humancodedobjects (dual inheritance:qlm_humancoded+qlm_coded), enabling full provenance tracking for human coding alongside LLM results. Supports custom metadata for coder information, training details, and coding instructions (#43).qlm_validate()andqlm_compare()now accept plain data frames and automatically convert them toqlm_humancodedobjects with an informational message. Users can callqlm_humancoded()directly to provide richer metadata (coder names, instructions, etc.) or use plain data frames for quick comparisons (#43).qlm_validate()andqlm_compare()now support non-standard evaluation (NSE) for thebyargument, allowing bothby = sentiment(unquoted) andby = "sentiment"(quoted) syntax. This provides a more natural, tidyverse-style interface while maintaining backward compatibility (#43).Print method for
qlm_codedobjects now distinguishes human from LLM coding, displaying “Source: Human coder” forqlm_humancodedobjects instead of model information.Improved error messages in
qlm_compare()andqlm_validate()now show which objects are missing the requested variable and list available alternatives.Adopt tidyverse-style error messaging via
cli::cli_abort()andcli::cli_warn()throughout the package, replacing allstop(),stopifnot(), andwarning()calls with structured, informative error messages.Documentation and CI notes refreshed.