
Code qualitative data with an LLM
qlm_code.RdApplies a codebook to input data using a large language model, returning a rich object that includes the codebook, execution settings, results, and metadata for reproducibility.
Arguments
- x
Input data: a character vector of texts (for text codebooks) or file paths to images (for image codebooks). Named vectors will use names as identifiers in the output; unnamed vectors will use sequential integers.
- codebook
A codebook object created with
qlm_codebook(). Also accepts deprecatedtask()objects for backward compatibility.- model
Provider (and optionally model) name in the form
"provider/model"or"provider"(which will use the default model for that provider). Passed to thenameargument ofellmer::chat(). Examples:"openai/gpt-4o-mini","anthropic/claude-3-5-sonnet-20241022","ollama/llama3.2","openai"(uses default OpenAI model).- ...
Additional arguments passed to
ellmer::chat(),ellmer::parallel_chat_structured(), orellmer::batch_chat_structured(), based on argument name. Arguments recognized byellmer::parallel_chat_structured()take priority when there are overlaps. Batch-specific arguments (path,wait,ignore_hash) are only used whenbatch = TRUE. Arguments not recognized by any function will generate a warning.- batch
Logical. If
TRUE, usesellmer::batch_chat_structured()instead ofellmer::parallel_chat_structured(). Batch processing is more cost-effective for large jobs but may have longer turnaround times. Default isFALSE. Seeellmer::batch_chat_structured()for details.- name
Character string identifying this coding run. Default is
NULL.- notes
Optional character string with descriptive notes about this coding run. Useful for documenting the purpose or rationale when viewing results in
qlm_trail(). Default isNULL.
Value
A qlm_coded object (a tibble with additional attributes):
- Data columns
The coded results with a
.idcolumn for identifiers.- Attributes
data,input_type, andrun(list containing name, batch, call, codebook, chat_args, execution_args, metadata, parent).
The object prints as a tibble and can be used directly in data manipulation workflows.
The batch flag in the run attribute indicates whether batch processing was used.
The execution_args contains all non-chat execution arguments (for either parallel or batch processing).
Details
Arguments in ... are dynamically routed to either ellmer::chat(),
ellmer::parallel_chat_structured(), or ellmer::batch_chat_structured()
based on their names.
Progress indicators and error handling are provided by the underlying
ellmer::parallel_chat_structured() or ellmer::batch_chat_structured()
function. Set verbose = TRUE to see progress messages during coding.
Retry logic for API failures should be configured through ellmer's options.
When batch = TRUE, the function uses ellmer::batch_chat_structured()
which submits jobs to the provider's batch API. This is typically more
cost-effective but has longer turnaround times. The path argument specifies
where batch results are cached, wait controls whether to wait for completion,
and ignore_hash can force reprocessing of cached results.
See also
qlm_codebook() for creating codebooks, qlm_replicate() for replicating
coding runs, qlm_compare() and qlm_validate() for assessing reliability.
Examples
# \donttest{
# Basic sentiment analysis
texts <- c("I love this product!", "Terrible experience.", "It's okay.")
coded <- qlm_code(texts, data_codebook_sentiment, model = "openai/gpt-4o-mini")
#> Error in openai_key(): Can't find env var `OPENAI_API_KEY`.
coded
#> Error: object 'coded' not found
# With named inputs (names become IDs in output)
texts_named <- c(review1 = "Great service!", review2 = "Very disappointing.")
coded2 <- qlm_code(texts_named, data_codebook_sentiment, model = "openai/gpt-4o-mini")
#> Error in openai_key(): Can't find env var `OPENAI_API_KEY`.
coded2
#> Error: object 'coded2' not found
# }