
Convert coded data to qlm_coded format
as_qlm_coded.RdConverts a data frame or quanteda corpus of coded data (human-coded or from
external sources) into a qlm_coded object. This enables provenance tracking
and integration with qlm_compare(), qlm_validate(), and qlm_trail() for
coded data alongside LLM-coded results.
Usage
as_qlm_coded(
x,
id,
name = NULL,
is_gold = FALSE,
codebook = NULL,
texts = NULL,
notes = NULL,
metadata = list()
)
# S3 method for class 'data.frame'
as_qlm_coded(
x,
id,
name = NULL,
is_gold = FALSE,
codebook = NULL,
texts = NULL,
notes = NULL,
metadata = list()
)
# Default S3 method
as_qlm_coded(
x,
id,
name = NULL,
is_gold = FALSE,
codebook = NULL,
texts = NULL,
notes = NULL,
metadata = list()
)Arguments
- x
A data frame or quanteda corpus object containing coded data. For data frames: Must include a column with unit identifiers (default
".id"). For corpus objects: Document variables (docvars) are treated as coded variables, and document names are used as identifiers by default.- id
For data frames: Name of the column containing unit identifiers (supports both quoted and unquoted). Default is
NULL, which looks for a column named".id". Can be an unquoted column name (id = doc_id) or a quoted string (id = "doc_id"). For corpus objects:NULL(default) uses document names fromnames(x), or specify a docvar name (quoted or unquoted) to use as identifiers.- name
Character. a string identifying this coding run (e.g., "Coder_A", "expert_rater", "Gold_Standard"). Default is
NULL.- is_gold
Logical. If
TRUE, marks this object as a gold standard for automatic detection byqlm_validate(). When a gold standard object is passed toqlm_validate(), thegold =parameter becomes optional. Default isFALSE.- codebook
Optional list containing coding instructions. Can include:
nameName of the coding scheme
instructionsText describing coding instructions
schemaNULL (not used for human coding)
If
NULL(default), a minimal placeholder codebook is created.- texts
Optional vector of original texts or data that were coded. Should correspond to the
.idvalues indata. If provided, enables more complete provenance tracking.- notes
Optional character string with descriptive notes about this coding. Useful for documenting details when viewing results in
qlm_trail(). Default isNULL.- metadata
Optional list of metadata about the coding process. Can include any relevant information such as:
coder_nameName of the human coder
coder_idIdentifier for the coder
trainingDescription of coder training
dateDate of coding
The function automatically adds
timestamp,n_units,notes, andsource = "human".
Value
A qlm_coded object (tibble with additional class and attributes)
for provenance tracking. When is_gold = TRUE, the object is marked as
a gold standard in its attributes.
Details
When printed, objects created with as_qlm_coded() display "Source: Human coder"
instead of model information, clearly distinguishing human from LLM coding.
Gold Standards
Objects marked with is_gold = TRUE are automatically detected by
qlm_validate(), allowing simpler syntax:
# With is_gold = TRUE
gold <- as_qlm_coded(gold_data, name = "Expert", is_gold = TRUE)
qlm_validate(coded1, coded2, gold, by = "sentiment") # gold = not needed!
# Without is_gold (or explicit gold =)
gold <- as_qlm_coded(gold_data, name = "Expert")
qlm_validate(coded1, coded2, gold = gold, by = "sentiment")See also
qlm_code() for LLM coding, qlm_compare() for inter-rater reliability,
qlm_validate() for validation against gold standards, qlm_trail() for
provenance tracking.
Examples
# Basic usage with data frame (default .id column)
human_data <- data.frame(
.id = 1:10,
sentiment = sample(c("pos", "neg"), 10, replace = TRUE)
)
coder_a <- as_qlm_coded(human_data, name = "Coder_A")
coder_a
#> # quallmer coded object
#> # Run: Coder_A
#> # Source: Human coder
#> # Units: 10
#>
#> # A tibble: 10 × 2
#> .id sentiment
#> * <int> <chr>
#> 1 1 pos
#> 2 2 pos
#> 3 3 pos
#> 4 4 neg
#> 5 5 neg
#> 6 6 pos
#> 7 7 pos
#> 8 8 pos
#> 9 9 pos
#> 10 10 neg
# Use custom id column with NSE (unquoted)
data_with_custom_id <- data.frame(
doc_id = 1:10,
sentiment = sample(c("pos", "neg"), 10, replace = TRUE)
)
coder_custom <- as_qlm_coded(data_with_custom_id, id = doc_id, name = "Coder_C")
# Or use quoted string
coder_custom2 <- as_qlm_coded(data_with_custom_id, id = "doc_id", name = "Coder_D")
# Create a gold standard from data frame
gold <- as_qlm_coded(
human_data,
name = "Expert",
is_gold = TRUE
)
# Validate with automatic gold detection
coder_b_data <- data.frame(
.id = 1:10,
sentiment = sample(c("pos", "neg"), 10, replace = TRUE)
)
coder_b <- as_qlm_coded(coder_b_data, name = "Coder_B")
# No need for gold = when gold object is marked (NSE works for 'by' too)
qlm_validate(coder_a, coder_b, gold = gold, by = sentiment, level = "nominal")
#>
#> ── quallmer validation ──
#>
#> n: 10
#>
#>
#> ── sentiment (nominal)
#> By class:
#> <macro>:
#> accuracy: 1.0000
#> precision: 1.0000
#> recall: 1.0000
#> F1: 1.0000
#> Cohen's kappa: 1.0000
#> accuracy: 0.3000
#> precision: 0.3333
#> recall: 0.3095
#> F1: 0.2929
#> Cohen's kappa: -0.2963
#>
# Create from corpus object (simplified workflow)
data("data_corpus_manifsentsUK2010sample")
crowd <- as_qlm_coded(
data_corpus_manifsentsUK2010sample,
is_gold = TRUE
)
# Document names automatically become .id, all docvars included
# Use a docvar as identifier with NSE (unquoted)
crowd_party <- as_qlm_coded(
data_corpus_manifsentsUK2010sample,
id = party,
is_gold = TRUE
)
# Or use quoted string
crowd_party2 <- as_qlm_coded(
data_corpus_manifsentsUK2010sample,
id = "party",
is_gold = TRUE
)
# With complete metadata
expert <- as_qlm_coded(
human_data,
name = "expert_rater",
is_gold = TRUE,
codebook = list(
name = "Sentiment Analysis",
instructions = "Code overall sentiment as positive or negative"
),
metadata = list(
coder_name = "Dr. Smith",
coder_id = "EXP001",
training = "5 years experience",
date = "2024-01-15"
)
)