Code qualitative data with an LLM

Applies a codebook to input data using a large language model, returning a rich object that includes the codebook, execution settings, results, and metadata for reproducibility.

Usage

qlm_code(x, codebook, model, ..., batch = FALSE, name = NULL, notes = NULL)

Arguments

x: Input data: a character vector of texts (for text codebooks) or file paths to images (for image codebooks). Named vectors will use names as identifiers in the output; unnamed vectors will use sequential integers.
codebook: A codebook object created with qlm_codebook(). Also accepts deprecated task() objects for backward compatibility.
model: Provider (and optionally model) name in the form "provider/model" or "provider" (which will use the default model for that provider). Passed to the name argument of ellmer::chat(). Examples: "openai/gpt-4o-mini", "anthropic/claude-3-5-sonnet-20241022", "ollama/llama3.2", "openai" (uses default OpenAI model).
...: Additional arguments passed to ellmer::chat(), ellmer::parallel_chat_structured(), or ellmer::batch_chat_structured(), based on argument name. Arguments recognized by ellmer::parallel_chat_structured() take priority when there are overlaps. Batch-specific arguments (path, wait, ignore_hash) are only used when batch = TRUE. Arguments not recognized by any function will generate a warning.
batch: Logical. If TRUE, uses ellmer::batch_chat_structured() instead of ellmer::parallel_chat_structured(). Batch processing is more cost-effective for large jobs but may have longer turnaround times. Default is FALSE. See ellmer::batch_chat_structured() for details.
name: Character string identifying this coding run. Default is NULL.
notes: Optional character string with descriptive notes about this coding run. Useful for documenting the purpose or rationale when viewing results in qlm_trail(). Default is NULL.

Value

A qlm_coded object (a tibble with additional attributes):

Data columns: The coded results with a .id column for identifiers.
Attributes: data, input_type, and run (list containing name, batch, call, codebook, chat_args, execution_args, metadata, parent).

The object prints as a tibble and can be used directly in data manipulation workflows. The batch flag in the run attribute indicates whether batch processing was used. The execution_args contains all non-chat execution arguments (for either parallel or batch processing).

Details

Arguments in ... are dynamically routed to either ellmer::chat(), ellmer::parallel_chat_structured(), or ellmer::batch_chat_structured() based on their names.

Progress indicators and error handling are provided by the underlying ellmer::parallel_chat_structured() or ellmer::batch_chat_structured() function. Set verbose = TRUE to see progress messages during coding. Retry logic for API failures should be configured through ellmer's options.

When batch = TRUE, the function uses ellmer::batch_chat_structured() which submits jobs to the provider's batch API. This is typically more cost-effective but has longer turnaround times. The path argument specifies where batch results are cached, wait controls whether to wait for completion, and ignore_hash can force reprocessing of cached results.

Examples

# \donttest{
# Basic sentiment analysis
texts <- c("I love this product!", "Terrible experience.", "It's okay.")
coded <- qlm_code(texts, data_codebook_sentiment, model = "openai/gpt-4o-mini")
#> Error in openai_key(): Can't find env var `OPENAI_API_KEY`.
coded
#> Error: object 'coded' not found

# With named inputs (names become IDs in output)
texts_named <- c(review1 = "Great service!", review2 = "Very disappointing.")
coded2 <- qlm_code(texts_named, data_codebook_sentiment, model = "openai/gpt-4o-mini")
#> Error in openai_key(): Can't find env var `OPENAI_API_KEY`.
coded2
#> Error: object 'coded2' not found
# }

Usage

Arguments

Value

Details

See also

Examples