Skip to contents

Applies a codebook to input data using a large language model, returning a rich object that includes the codebook, execution settings, results, and metadata for reproducibility.

Usage

qlm_code(x, codebook, model, ..., batch = FALSE, name = NULL, notes = NULL)

Arguments

x

Input data: a character vector of texts (for text codebooks) or file paths to images (for image codebooks). Named vectors will use names as identifiers in the output; unnamed vectors will use sequential integers.

codebook

A codebook object created with qlm_codebook(). Also accepts deprecated task() objects for backward compatibility.

model

Provider (and optionally model) name in the form "provider/model" or "provider" (which will use the default model for that provider). Passed to the name argument of ellmer::chat(). Examples: "openai/gpt-4o-mini", "anthropic/claude-3-5-sonnet-20241022", "ollama/llama3.2", "openai" (uses default OpenAI model).

...

Additional arguments passed to ellmer::chat(), ellmer::parallel_chat_structured(), or ellmer::batch_chat_structured(), based on argument name. Arguments recognized by ellmer::parallel_chat_structured() take priority when there are overlaps. Batch-specific arguments (path, wait, ignore_hash) are only used when batch = TRUE. Arguments not recognized by any function will generate a warning.

batch

Logical. If TRUE, uses ellmer::batch_chat_structured() instead of ellmer::parallel_chat_structured(). Batch processing is more cost-effective for large jobs but may have longer turnaround times. Default is FALSE. See ellmer::batch_chat_structured() for details.

name

Character string identifying this coding run. Default is NULL.

notes

Optional character string with descriptive notes about this coding run. Useful for documenting the purpose or rationale when viewing results in qlm_trail(). Default is NULL.

Value

A qlm_coded object (a tibble with additional attributes):

Data columns

The coded results with a .id column for identifiers.

Attributes

data, input_type, and run (list containing name, batch, call, codebook, chat_args, execution_args, metadata, parent).

The object prints as a tibble and can be used directly in data manipulation workflows. The batch flag in the run attribute indicates whether batch processing was used. The execution_args contains all non-chat execution arguments (for either parallel or batch processing).

Details

Arguments in ... are dynamically routed to either ellmer::chat(), ellmer::parallel_chat_structured(), or ellmer::batch_chat_structured() based on their names.

Progress indicators and error handling are provided by the underlying ellmer::parallel_chat_structured() or ellmer::batch_chat_structured() function. Set verbose = TRUE to see progress messages during coding. Retry logic for API failures should be configured through ellmer's options.

When batch = TRUE, the function uses ellmer::batch_chat_structured() which submits jobs to the provider's batch API. This is typically more cost-effective but has longer turnaround times. The path argument specifies where batch results are cached, wait controls whether to wait for completion, and ignore_hash can force reprocessing of cached results.

See also

qlm_codebook() for creating codebooks, qlm_replicate() for replicating coding runs, qlm_compare() and qlm_validate() for assessing reliability.

Examples

# \donttest{
# Basic sentiment analysis
texts <- c("I love this product!", "Terrible experience.", "It's okay.")
coded <- qlm_code(texts, data_codebook_sentiment, model = "openai/gpt-4o-mini")
#> Error in openai_key(): Can't find env var `OPENAI_API_KEY`.
coded
#> Error: object 'coded' not found

# With named inputs (names become IDs in output)
texts_named <- c(review1 = "Great service!", review2 = "Very disappointing.")
coded2 <- qlm_code(texts_named, data_codebook_sentiment, model = "openai/gpt-4o-mini")
#> Error in openai_key(): Can't find env var `OPENAI_API_KEY`.
coded2
#> Error: object 'coded2' not found
# }