Defining custom tasks • quallmer

In this tutorial, we will explore how to create custom annotation tasks using the quallmer package. Custom tasks allow you to tailor the LLM’s output to your specific research questions and data types using the task() function, providing greater flexibility and control over the annotation process.

In the following example, we will demonstrate how to define a custom task for scoring documents based on their alignment with political left ideologies. For this, we formulate a prompt that asks the LLM to score documents on a scale of political left alignment. We then define the expected response structure using the task() function. Finally, we will use the annotate() function to apply this custom task to a sample corpus of inaugural speeches from US presidents.

Loading packages and data

# We will use the quanteda package 
# for loading a sample corpus of innaugural speeches
# If you have not yet installed the quanteda package, you can do so by:
# install.packages("quanteda")
library(quanteda)

## Package version: 4.3.1
## Unicode version: 15.1
## ICU version: 74.2

## Parallel computing: disabled

## See https://quanteda.io for tutorials and examples.

library(quallmer)

## Loading required package: ellmer

# For educational purposes, 
# we will use a subset of the inaugural speeches corpus
# The three most recent speeches in the corpus
data_corpus_inaugural <- quanteda::data_corpus_inaugural[57:60]

Defining a custom prompt

Defining prompts is a crucial step in creating custom tasks. The prompt guides the LLM on how to interpret the input data and what kind of output to generate. In this example, we will create a prompt that instructs the LLM to score documents based on their alignment with political left ideologies. Prompts can be much longer and more complex depending on the task at hand. Prompts should be clear and specific to ensure that the LLM understands the task requirements.

prompt <- "Score the following document on a scale of how much it aligns
with the political left. The political left is defined as groups which
advocate for social equality, government intervention in the economy,
and progressive policies. Use the following metrics:
SCORING METRIC:
3 : extremely left
2 : very left
1 : slightly left
0 : not at all left"

Defining the structure of the response with define_task()

The task() function allows us to specify the expected structure of the LLM’s response. It has the following important arguments which users need to specify:

name: A descriptive name for the task.
system_prompt: The prompt that guides the LLM on how to perform the task.
type_def: Defines the expected structure of the response using ellmers type specifications such as type_object(), type_array(), etc.

For more information on how to use ellmer’s type specifications, please refer to the ellmer documentation on type specifications.

# Define the custom task using task()
ideology_scores <- task(
  name = "Score Political Left Alignment",
  system_prompt = prompt,
  type_def = type_object(
    score = type_number("Score"),
    explanation = type_string("Explanation")
  ),
  input_type = "text"
)

Applying the custom task to the corpus

This step is similar to applying predefined tasks using the annotate() function. Here, we will use the annotate() function to apply our custom task to the sample corpus of inaugural speeches. We will specify the model to use via model_name (in this case, "openai/gpt-4o") and any additional parameters as needed. For example, we set the temperature to 0 via the params argument for more deterministic outputs, improving consistency in scoring across multiple runs and therefore increasing reliability.

# Apply the custom task to the inaugural speeches corpus
result <- annotate(data_corpus_inaugural, task = ideology_scores,
                   model_name = "openai/gpt-4o",
                   params = list(temperature = 0))

## [working] (0 + 0) -> 3 -> 1 | ■■■■■■■■■                         25%

## [working] (0 + 0) -> 0 -> 4 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  100%

id	score	explanation
2013-Obama	2	The document aligns very well with the political left, emphasizing social equality, government intervention, and progressive policies. It advocates for collective action, economic equality, climate change response, and social justice, all of which are key tenets of leftist ideology. However, it also acknowledges skepticism of central authority and the importance of personal responsibility, which slightly moderates its alignment.
2017-Trump	0	The document emphasizes nationalism, protectionism, and a focus on American interests, which are not typically aligned with the political left. It lacks advocacy for social equality, government intervention in the economy, or progressive policies, which are key aspects of leftist ideology. Therefore, it scores 0 for alignment with the political left.
2021-Biden	2	The document aligns very well with the political left, emphasizing themes of social equality, racial justice, and government intervention in addressing economic challenges. It calls for unity, healing, and addressing systemic racism and climate change, which are typically progressive priorities. However, it also emphasizes unity and bipartisanship, which slightly moderates its alignment with the extreme left.
2025-Trump	0	The document emphasizes nationalism, border security, military strength, and economic independence, which are typically associated with right-wing ideologies. It criticizes government intervention and progressive policies like the Green New Deal, and promotes traditional values and a merit-based society. These elements do not align with the political left’s focus on social equality, government intervention in the economy, and progressive policies.

Now you have successfully created and applied a custom annotation task using the quallmer package! You can further modify the prompt and response structure to suit your specific research needs.