| Title: | Qualitative Analysis with Large Language Models |
|---|---|
| Description: | Tools for AI-assisted qualitative data coding using large language models ('LLMs') via the 'ellmer' package, supporting providers including 'OpenAI', 'Anthropic', 'Google', 'Azure', and local models via 'Ollama'. Provides a 'codebook'-based workflow for defining coding instructions and applying them to texts, images, and other data. Includes built-in 'codebooks' for common applications such as sentiment analysis and policy coding, and functions for creating custom 'codebooks' for specific research questions. Supports systematic replication across models and settings, computing inter-coder reliability statistics including Krippendorff's alpha (Krippendorff 2019, <doi:10.4135/9781071878781>) and Fleiss' kappa (Fleiss 1971, <doi:10.1037/h0031619>), as well as gold-standard validation metrics including accuracy, precision, recall, and F1 scores following Sokolova and Lapalme (2009, <doi:10.1016/j.ipm.2009.03.002>). Provides audit trail functionality for documenting coding workflows following Lincoln and Guba's (1985, ISBN:0803924313) framework for establishing trustworthiness in qualitative research. |
| Authors: | Seraphine F. Maerz [aut, cre] (ORCID: <https://orcid.org/0000-0002-7173-9617>), Kenneth Benoit [aut] (ORCID: <https://orcid.org/0000-0002-0797-564X>) |
| Maintainer: | Seraphine F. Maerz <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.4.0.9000 |
| Built: | 2026-06-03 08:32:13 UTC |
| Source: | https://github.com/quallmer/quallmer |
Functions to safely access and modify metadata from quallmer objects
(qlm_coded, qlm_comparison, qlm_validation, qlm_codebook). These
functions provide a stable API for accessing object metadata without
directly manipulating internal attributes.
quallmer objects store metadata in three categories:
User metadata (type = "user"):
name: Run identifier (settable)
notes: Descriptive notes (settable)
Plus any custom fields added via as_qlm_coded(..., metadata = list(...))
Object metadata (type = "object"):
call: Function call that created the object
parent: Parent run name (for replications)
batch: Whether batch processing was used
chat_args: Arguments passed to the LLM chat
execution_args: Arguments for parallel/batch execution
n_units: Number of coded units
input_type: Type of input ("text", "image", or "human")
source: Coding source ("human" or "llm")
is_gold: Whether this is a gold standard
System metadata (type = "system"):
timestamp: When the object was created
ellmer_version: Version of ellmer package
quallmer_version: Version of quallmer package
R_version: Version of R
qlm_meta(): Get metadata fields
qlm_meta<-(): Set user metadata fields (only name and notes)
codebook(): Extract codebook from coded objects
inputs(): Extract original input data
qlm_code() for creating coded objects
as_qlm_coded() for converting human-coded data
qlm_trail() for viewing coding history
# Create a coded object texts <- c("I love this!", "Terrible.", "It's okay.") coded <- qlm_code( texts, data_codebook_sentiment, model = "openai/gpt-4o-mini", name = "run1", notes = "Initial coding run" ) # Access metadata qlm_meta(coded, "name") # Get run name qlm_meta(coded, type = "user") # Get all user metadata qlm_meta(coded, type = "system") # Get system metadata # Modify user metadata qlm_meta(coded, "name") <- "updated_run1" qlm_meta(coded, "notes") <- "Revised notes" # Extract components codebook(coded) # Get the codebook inputs(coded) # Get original texts # Custom metadata from human coding human_data <- data.frame( .id = 1:5, sentiment = c("pos", "neg", "pos", "neg", "pos") ) human_coded <- as_qlm_coded( human_data, name = "coder_A", metadata = list( coder_name = "Dr. Smith", experience = "5 years" ) ) # Access custom metadata qlm_meta(human_coded, "coder_name") # "Dr. Smith" qlm_meta(human_coded, type = "user") # All user fields# Create a coded object texts <- c("I love this!", "Terrible.", "It's okay.") coded <- qlm_code( texts, data_codebook_sentiment, model = "openai/gpt-4o-mini", name = "run1", notes = "Initial coding run" ) # Access metadata qlm_meta(coded, "name") # Get run name qlm_meta(coded, type = "user") # Get all user metadata qlm_meta(coded, type = "system") # Get system metadata # Modify user metadata qlm_meta(coded, "name") <- "updated_run1" qlm_meta(coded, "notes") <- "Revised notes" # Extract components codebook(coded) # Get the codebook inputs(coded) # Get original texts # Custom metadata from human coding human_data <- data.frame( .id = 1:5, sentiment = c("pos", "neg", "pos", "neg", "pos") ) human_coded <- as_qlm_coded( human_data, name = "coder_A", metadata = list( coder_name = "Dr. Smith", experience = "5 years" ) ) # Access custom metadata qlm_meta(human_coded, "coder_name") # "Dr. Smith" qlm_meta(human_coded, type = "user") # All user fields
Converts a data frame or quanteda corpus of coded data (human-coded or from
external sources) into a qlm_coded object. This enables provenance tracking
and integration with qlm_compare(), qlm_validate(), and qlm_trail() for
coded data alongside LLM-coded results.
as_qlm_coded( x, id, name = NULL, is_gold = FALSE, codebook = NULL, texts = NULL, notes = NULL, metadata = list(), qlm_segment = FALSE, source_text = NULL ) ## S3 method for class 'data.frame' as_qlm_coded( x, id, name = NULL, is_gold = FALSE, codebook = NULL, texts = NULL, notes = NULL, metadata = list(), qlm_segment = FALSE, source_text = NULL ) ## Default S3 method: as_qlm_coded( x, id, name = NULL, is_gold = FALSE, codebook = NULL, texts = NULL, notes = NULL, metadata = list(), qlm_segment = FALSE, source_text = NULL )as_qlm_coded( x, id, name = NULL, is_gold = FALSE, codebook = NULL, texts = NULL, notes = NULL, metadata = list(), qlm_segment = FALSE, source_text = NULL ) ## S3 method for class 'data.frame' as_qlm_coded( x, id, name = NULL, is_gold = FALSE, codebook = NULL, texts = NULL, notes = NULL, metadata = list(), qlm_segment = FALSE, source_text = NULL ) ## Default S3 method: as_qlm_coded( x, id, name = NULL, is_gold = FALSE, codebook = NULL, texts = NULL, notes = NULL, metadata = list(), qlm_segment = FALSE, source_text = NULL )
x |
A data frame or quanteda corpus object containing coded data.
For data frames: Must include a column with unit identifiers (default
|
id |
For data frames: Name of the column containing unit identifiers
(supports both quoted and unquoted). Default is |
name |
Character. a string identifying this coding run (e.g., "Coder_A",
"expert_rater", "Gold_Standard"). Default is |
is_gold |
Logical. If |
codebook |
Optional list containing coding instructions. Can include:
If |
texts |
Optional vector of original texts or data that were coded.
Should correspond to the |
notes |
Optional character string with descriptive notes about this
coding. Useful for documenting details when viewing results in
|
metadata |
Optional list of metadata about the coding process. Can include any relevant information such as:
The function automatically adds |
qlm_segment |
Logical. If |
source_text |
A named character vector of source texts. Required when
|
When printed, objects created with as_qlm_coded() display "Source: Human coder"
instead of model information, clearly distinguishing human from LLM coding.
Objects marked with is_gold = TRUE are automatically detected by
qlm_validate(), allowing simpler syntax:
# With is_gold = TRUE gold <- as_qlm_coded(gold_data, name = "Expert", is_gold = TRUE) qlm_validate(coded1, coded2, gold, by = "sentiment") # gold = not needed! # Without is_gold (or explicit gold =) gold <- as_qlm_coded(gold_data, name = "Expert") qlm_validate(coded1, coded2, gold = gold, by = "sentiment")
A qlm_coded object (tibble with additional class and attributes)
for provenance tracking. When is_gold = TRUE, the object is marked as
a gold standard in its attributes.
qlm_code() for LLM coding, qlm_compare() for inter-rater reliability,
qlm_validate() for validation against gold standards, qlm_trail() for
provenance tracking.
# Basic usage with data frame (default .id column) human_data <- data.frame( .id = 1:10, sentiment = sample(c("pos", "neg"), 10, replace = TRUE) ) coder_a <- as_qlm_coded(human_data, name = "Coder_A") coder_a # Use custom id column with NSE (unquoted) data_with_custom_id <- data.frame( doc_id = 1:10, sentiment = sample(c("pos", "neg"), 10, replace = TRUE) ) coder_custom <- as_qlm_coded(data_with_custom_id, id = doc_id, name = "Coder_C") # Or use quoted string coder_custom2 <- as_qlm_coded(data_with_custom_id, id = "doc_id", name = "Coder_D") # Create a gold standard from data frame gold <- as_qlm_coded( human_data, name = "Expert", is_gold = TRUE ) # Validate with automatic gold detection coder_b_data <- data.frame( .id = 1:10, sentiment = sample(c("pos", "neg"), 10, replace = TRUE) ) coder_b <- as_qlm_coded(coder_b_data, name = "Coder_B") # No need for gold = when gold object is marked (NSE works for 'by' too) qlm_validate(coder_a, coder_b, gold = gold, by = sentiment, level = "nominal") # Create from corpus object (simplified workflow) data("data_corpus_manifsentsUK2010sample") crowd <- as_qlm_coded( data_corpus_manifsentsUK2010sample, is_gold = TRUE ) # Document names automatically become .id, all docvars included # Use a docvar as identifier with NSE (unquoted) crowd_party <- as_qlm_coded( data_corpus_manifsentsUK2010sample, id = party, is_gold = TRUE ) # Or use quoted string crowd_party2 <- as_qlm_coded( data_corpus_manifsentsUK2010sample, id = "party", is_gold = TRUE ) # With complete metadata expert <- as_qlm_coded( human_data, name = "expert_rater", is_gold = TRUE, codebook = list( name = "Sentiment Analysis", instructions = "Code overall sentiment as positive or negative" ), metadata = list( coder_name = "Dr. Smith", coder_id = "EXP001", training = "5 years experience", date = "2024-01-15" ) )# Basic usage with data frame (default .id column) human_data <- data.frame( .id = 1:10, sentiment = sample(c("pos", "neg"), 10, replace = TRUE) ) coder_a <- as_qlm_coded(human_data, name = "Coder_A") coder_a # Use custom id column with NSE (unquoted) data_with_custom_id <- data.frame( doc_id = 1:10, sentiment = sample(c("pos", "neg"), 10, replace = TRUE) ) coder_custom <- as_qlm_coded(data_with_custom_id, id = doc_id, name = "Coder_C") # Or use quoted string coder_custom2 <- as_qlm_coded(data_with_custom_id, id = "doc_id", name = "Coder_D") # Create a gold standard from data frame gold <- as_qlm_coded( human_data, name = "Expert", is_gold = TRUE ) # Validate with automatic gold detection coder_b_data <- data.frame( .id = 1:10, sentiment = sample(c("pos", "neg"), 10, replace = TRUE) ) coder_b <- as_qlm_coded(coder_b_data, name = "Coder_B") # No need for gold = when gold object is marked (NSE works for 'by' too) qlm_validate(coder_a, coder_b, gold = gold, by = sentiment, level = "nominal") # Create from corpus object (simplified workflow) data("data_corpus_manifsentsUK2010sample") crowd <- as_qlm_coded( data_corpus_manifsentsUK2010sample, is_gold = TRUE ) # Document names automatically become .id, all docvars included # Use a docvar as identifier with NSE (unquoted) crowd_party <- as_qlm_coded( data_corpus_manifsentsUK2010sample, id = party, is_gold = TRUE ) # Or use quoted string crowd_party2 <- as_qlm_coded( data_corpus_manifsentsUK2010sample, id = "party", is_gold = TRUE ) # With complete metadata expert <- as_qlm_coded( human_data, name = "expert_rater", is_gold = TRUE, codebook = list( name = "Sentiment Analysis", instructions = "Code overall sentiment as positive or negative" ), metadata = list( coder_name = "Dr. Smith", coder_id = "EXP001", training = "5 years experience", date = "2024-01-15" ) )
A qlm_codebook object defining instructions for annotating whether a text
pertains to immigration policy and, if so, the stance toward immigration
openness. This codebook replicates the crowd-sourced annotation task from
Benoit et al. (2016) and is designed to work with
data_corpus_manifsentsUK2010sample.
data_codebook_immigrationdata_codebook_immigration
A qlm_codebook object containing:
Task name: "Immigration policy coding from Benoit et al. (2016)"
Coding instructions for identifying whether sentences from UK 2010 election manifestos pertain to immigration policy, and if so, rating the policy position expressed
Response schema with two fields: llm_immigration_label
(Enum: "Not immigration" or "Immigration" indicating whether the sentence
relates to immigration policy), and llm_immigration_position (Integer
from -1 to 1, where -1 = pro-immigration, 0 = neutral, and 1 =
anti-immigration)
"text"
Named character vector: llm_immigration_label = "nominal", llm_immigration_position = "ordinal"
Benoit, K., Conway, D., Lauderdale, B.E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. American Political Science Review, 110(2), 278–295. doi:10.1017/S0003055416000058
qlm_codebook(), qlm_code(), data_corpus_manifsentsUK2010sample
# View the codebook data_codebook_immigration ## Not run: # Use with UK manifesto sentences (requires API key) if (requireNamespace("quanteda", quietly = TRUE)) { coded <- qlm_code(data_corpus_manifsentsUK2010sample, data_codebook_immigration, model = "openai/gpt-4o-mini") # Compare with crowd-sourced annotations crowd <- as_qlm_coded( data.frame( .id = docnames(data_corpus_manifsentsUK2010sample), docvars(data_corpus_manifsentsUK2010sample) ), is_gold = TRUE ) qlm_validate(coded, gold = crowd) } ## End(Not run)# View the codebook data_codebook_immigration ## Not run: # Use with UK manifesto sentences (requires API key) if (requireNamespace("quanteda", quietly = TRUE)) { coded <- qlm_code(data_corpus_manifsentsUK2010sample, data_codebook_immigration, model = "openai/gpt-4o-mini") # Compare with crowd-sourced annotations crowd <- as_qlm_coded( data.frame( .id = docnames(data_corpus_manifsentsUK2010sample), docvars(data_corpus_manifsentsUK2010sample) ), is_gold = TRUE ) qlm_validate(coded, gold = crowd) } ## End(Not run)
A qlm_codebook object defining instructions for sentiment analysis of movie
reviews. Designed to work with data_corpus_LMRDsample but with an expanded
polarity scale that includes a "mixed" category.
data_codebook_sentimentdata_codebook_sentiment
A qlm_codebook object containing:
Task name: "Movie Review Sentiment"
Coding instructions for analyzing movie review sentiment
Response schema with two fields: polarity (Enum of "neg", "mixed", or "pos") and rating (Integer from 1 to 10)
Expert film critic persona
"text"
qlm_codebook(), qlm_code(), qlm_compare(), data_corpus_LMRDsample
# View the codebook data_codebook_sentiment # Use with movie review corpus (requires API key) coded <- qlm_code(data_corpus_LMRDsample[1:10], data_codebook_sentiment, model = "openai") # Create multiple coded versions for comparison coded1 <- qlm_code(data_corpus_LMRDsample[1:20], data_codebook_sentiment, model = "openai/gpt-4o-mini") coded2 <- qlm_code(data_corpus_LMRDsample[1:20], data_codebook_sentiment, model = "openai/gpt-4o") # Compare inter-rater reliability comparison <- qlm_compare(coded1, coded2, by = "rating", level = "interval") print(comparison)# View the codebook data_codebook_sentiment # Use with movie review corpus (requires API key) coded <- qlm_code(data_corpus_LMRDsample[1:10], data_codebook_sentiment, model = "openai") # Create multiple coded versions for comparison coded1 <- qlm_code(data_corpus_LMRDsample[1:20], data_codebook_sentiment, model = "openai/gpt-4o-mini") coded2 <- qlm_code(data_corpus_LMRDsample[1:20], data_codebook_sentiment, model = "openai/gpt-4o") # Compare inter-rater reliability comparison <- qlm_compare(coded1, coded2, by = "rating", level = "interval") print(comparison)
A sample of 100 positive and 100 negative reviews from the Maas et al. (2011) dataset for sentiment classification. The original dataset contains 50,000 highly polar movie reviews.
data_corpus_LMRDsampledata_corpus_LMRDsample
The corpus docvars consist of:
serial (within set and polarity) document number
user-assigned movie rating on a 1-10 point integer scale
either neg or pos to indicate whether the
movie review was negative or positive. See Maas et al (2011) for the
cut-off values that governed this assignment.
http://ai.stanford.edu/~amaas/data/sentiment/
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). "Learning Word Vectors for Sentiment Analysis". The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
data_codebook_sentiment for an example codebook and usage with this corpus
if (requireNamespace("quanteda", quietly = TRUE)) { # Inspect the corpus summary(data_corpus_LMRDsample) # Sample a few reviews head(data_corpus_LMRDsample, 3) }if (requireNamespace("quanteda", quietly = TRUE)) { # Inspect the corpus summary(data_corpus_LMRDsample) # Sample a few reviews head(data_corpus_LMRDsample, 3) }
A corpus of sentences sampled from from publicly available party manifestos from the United Kingdom from the 2010 election. Each sentence has been rated in terms of its classification as pertaining to immigration or not and then on a scale of favorability or not toward open immigration policy (as the mean score of crowd coders on a scale of -1 (favours open immigration policy), 0 (neutral), or 1 (anti-immigration).
The sentences were sampled from the corpus used in Benoit et al. (2016) doi:10.1017/S0003055416000058, which contains more information on the crowd-sourced annotation approach.
data_corpus_manifsentsUK2010sampledata_corpus_manifsentsUK2010sample
A corpus object. The corpus consists of 155 sentences randomly sampled from the party manifestos, with an attempt to balance the sentencs according to their categorisation as pertaining to immigration or not, as well as by party. The corpus contains the following document-level variables:
factor; abbreviation of the party that wrote the manifesto.
factor; party that wrote the manifesto.
integer; 4-digit year of the election.
Factor indicating whether the majority of
crowd workers labelled a sentence as referring to immigration or not. The
variable has missing values (NA) for all non-annotated manifestos.
numeric; the direction of statements coded as "Immigration" based on the aggregated crowd codings. The variable is the mean of the scores assigned by workers who coded a sentence and who allocated the sentence to the "Immigration" category. The variable ranges from -1 (Favorable and open immigration policy) to +1 ("Negative and closed immigration policy").
integer; the number of coders who
contributed to the mean score immigration_mean.
integer; a thresholded version of immigration_mean
coded as -1 (pro-immigration, mean < -0.5), 0 (neutral, -0.5 <= mean <= 0.5),
or 1 (anti-immigration, mean > 0.5). Set to NA for non-immigration sentences.
Benoit, K., Conway, D., Lauderdale, B.E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. American Political Science Review, 100,(2), 278–295. doi:10.1017/S0003055416000058
if (requireNamespace("quanteda", quietly = TRUE)) { # Inspect the corpus summary(data_corpus_manifsentsUK2010sample) }if (requireNamespace("quanteda", quietly = TRUE)) { # Inspect the corpus summary(data_corpus_manifsentsUK2010sample) }
Two datasets derived from Appendix 2 of Klingemann et al. (2006), which provides worked examples of the Manifesto Project quasi-sentence coding scheme.
data_corpus_MPexamples is a two-document corpus containing the full source
texts of the Liberal-SDP Alliance 1983 UK election manifesto and the New
Zealand National Party 1972 election manifesto, reconstructed by joining
the quasi-sentences from the gold-standard annotation.
data_corpus_MPexamplesseg is the corresponding gold-standard segmented
corpus, produced by converting the Manifesto Project's human-coded
quasi-sentences via as_qlm_coded() with qlm_segment = TRUE. It is marked
as a gold standard (is_gold = TRUE) and can be passed directly to
qlm_compare() alongside output from qlm_segment() to compute
Krippendorff's alpha for unitizing.
data_corpus_MPexamples data_corpus_MPexamplessegdata_corpus_MPexamples data_corpus_MPexamplesseg
data_corpus_MPexamples: A corpus with 2 documents
and the following document-level variables:
Character. Country of origin: "UK" or "NZ".
Character. Party name: "Liberal-SDP Alliance" or
"National Party".
Integer. Election year: 1983 or 1972.
data_corpus_MPexamplesseg: A segmented corpus with
178 quasi-sentences (107 Liberal-SDP, 71 NZ National Party) and the
following document-level variables:
Character. Source document identifier ("Liberal_SDP_1983"
or "NZ_NP_1972").
Integer. Quasi-sentence index within the source document.
Integer. Start character position in the source text.
Integer. End character position in the source text.
Character. Manifesto Project manifesto label
("Liberal-SDP 1983" or "NP 1972").
Character. Country of origin: "UK" or "NZ".
Integer. Manifesto Project policy category code.
An object of class corpus (inherits from character) of length 178.
Klingemann, H. D., Volkens, A., Bara, J., Budge, I., & McDonald, M. D. (2006). Mapping Policy Preferences II: Estimates for Parties, Electors, and Governments in Eastern Europe, European Union, and OECD 1990–2003. Oxford University Press.
qlm_segment(), as_qlm_coded(), qlm_compare()
if (requireNamespace("quanteda", quietly = TRUE)) { # Inspect the source texts summary(data_corpus_MPexamples) # Subset to one manifesto quanteda::corpus_subset(data_corpus_MPexamples, country == "NZ") # Gold-standard segmentation for the NZ manifesto quanteda::corpus_subset(data_corpus_MPexamplesseg, quanteda::docvars(data_corpus_MPexamplesseg, "docid") == "NZ_NP_1972") }if (requireNamespace("quanteda", quietly = TRUE)) { # Inspect the source texts summary(data_corpus_MPexamples) # Subset to one manifesto quanteda::corpus_subset(data_corpus_MPexamples, country == "NZ") # Gold-standard segmentation for the NZ manifesto quanteda::corpus_subset(data_corpus_MPexamplesseg, quanteda::docvars(data_corpus_MPexamplesseg, "docid") == "NZ_NP_1972") }
A corpus of 100 speeches from the Maerz & Schneider (2020) corpus, balanced across regime types (50 autocracies, 50 democracies). This sample is included in the package for demos and testing. The full corpus of 4,740 speeches is available in the package's pkgdown examples folder.
data_corpus_ms2020sampledata_corpus_ms2020sample
A corpus object. The corpus consists of 100 speeches randomly sampled from 40 heads of government across 27 countries, balanced by regime type. The corpus contains the following document-level variables:
Character. Name of the head of government.
Character. Country name.
Factor. Regime type: "Democracy" or "Autocracy".
Numeric. Original dictionary-based liberal-illiberal score.
Date. Date of the speech.
Character. Title of the speech.
Maerz, S. F., & Schneider, C. Q. (2020). Comparing public communication in democracies and autocracies: Automated text analyses of speeches by heads of government. Quality & Quantity, 54, 517-545. doi:10.1007/s11135-019-00885-7
if (requireNamespace("quanteda", quietly = TRUE)) { # Inspect the corpus summary(data_corpus_ms2020sample, n = 10) # Regime distribution table(data_corpus_ms2020sample$regime) # View a sample speech cat(data_corpus_ms2020sample[1]) }if (requireNamespace("quanteda", quietly = TRUE)) { # Inspect the corpus summary(data_corpus_ms2020sample, n = 10) # Regime distribution table(data_corpus_ms2020sample$regime) # View a sample speech cat(data_corpus_ms2020sample[1]) }
Applies a codebook to input data using a large language model, returning a rich object that includes the codebook, execution settings, results, and metadata for reproducibility.
qlm_code(x, codebook, model, ..., batch = FALSE, name = NULL, notes = NULL)qlm_code(x, codebook, model, ..., batch = FALSE, name = NULL, notes = NULL)
x |
Input data: a character vector of texts (for text codebooks) or file paths to images (for image codebooks). Named vectors will use names as identifiers in the output; unnamed vectors will use sequential integers. |
codebook |
A codebook object created with |
model |
Provider (and optionally model) name in the form
|
... |
Additional arguments passed to |
batch |
Logical. If |
name |
Character string identifying this coding run. Default is |
notes |
Optional character string with descriptive notes about this
coding run. Useful for documenting the purpose or rationale when viewing
results in |
Arguments in ... are dynamically routed to either ellmer::chat(),
ellmer::parallel_chat_structured(), or ellmer::batch_chat_structured()
based on their names.
Progress indicators and error handling are provided by the underlying
ellmer::parallel_chat_structured() or ellmer::batch_chat_structured()
function. Set verbose = TRUE to see progress messages during coding.
Retry logic for API failures should be configured through ellmer's options.
When batch = TRUE, the function uses ellmer::batch_chat_structured()
which submits jobs to the provider's batch API. This is typically more
cost-effective but has longer turnaround times. The path argument specifies
where batch results are cached, wait controls whether to wait for completion,
and ignore_hash can force reprocessing of cached results.
A qlm_coded object (a tibble with additional attributes):
The coded results with a .id column for identifiers.
data, input_type, and run (list containing name, batch, call, codebook, chat_args, execution_args, metadata, parent).
The object prints as a tibble and can be used directly in data manipulation workflows.
The batch flag in the run attribute indicates whether batch processing was used.
The execution_args contains all non-chat execution arguments (for either parallel or batch processing).
qlm_codebook() for creating codebooks, qlm_replicate() for replicating
coding runs, qlm_compare() and qlm_validate() for assessing reliability.
# Requires API credentials and internet access; not run in package checks. ## Not run: # Basic sentiment analysis texts <- c("I love this product!", "Terrible experience.", "It's okay.") coded <- qlm_code(texts, data_codebook_sentiment, model = "openai/gpt-4o-mini") coded # With named inputs (names become IDs in output) texts_named <- c(review1 = "Great service!", review2 = "Very disappointing.") coded2 <- qlm_code(texts_named, data_codebook_sentiment, model = "openai/gpt-4o-mini") coded2 ## End(Not run)# Requires API credentials and internet access; not run in package checks. ## Not run: # Basic sentiment analysis texts <- c("I love this product!", "Terrible experience.", "It's okay.") coded <- qlm_code(texts, data_codebook_sentiment, model = "openai/gpt-4o-mini") coded # With named inputs (names become IDs in output) texts_named <- c(review1 = "Great service!", review2 = "Very disappointing.") coded2 <- qlm_code(texts_named, data_codebook_sentiment, model = "openai/gpt-4o-mini") coded2 ## End(Not run)
Creates a codebook definition for use with qlm_code(). A codebook specifies
what information to extract from input data, including the instructions
that guide the LLM and the structured output schema.
qlm_codebook( name, instructions, schema, role = NULL, input_type = c("text", "image"), levels = NULL )qlm_codebook( name, instructions, schema, role = NULL, input_type = c("text", "image"), levels = NULL )
name |
Name of the codebook (character). |
instructions |
Instructions to guide the model in performing the coding task. |
schema |
Structured output definition, e.g., created by
|
role |
Optional role description for the model (e.g., "You are an expert annotator"). If provided, this will be prepended to the instructions when creating the system prompt. |
input_type |
Type of input data: |
levels |
Optional named list specifying measurement levels for each
variable in the schema. Names should match schema property names. Values
should be one of |
This function replaces task(), which is now deprecated. The returned object
has dual class inheritance (c("qlm_codebook", "task")) to maintain
backward compatibility.
A codebook object (a list with class c("qlm_codebook", "task"))
containing the codebook definition. Use with qlm_code() to apply the
codebook to data.
qlm_code() for applying codebooks to data,
data_codebook_sentiment for a predefined codebook example,
task() for the deprecated function.
# Define a custom codebook my_codebook <- qlm_codebook( name = "Sentiment", instructions = "Rate the sentiment from -1 (negative) to 1 (positive).", schema = type_object( score = type_number("Sentiment score from -1 to 1"), explanation = type_string("Brief explanation") ) ) # With a role my_codebook_role <- qlm_codebook( name = "Sentiment", instructions = "Rate the sentiment from -1 (negative) to 1 (positive).", schema = type_object( score = type_number("Sentiment score from -1 to 1"), explanation = type_string("Brief explanation") ), role = "You are an expert sentiment analyst." ) # With explicit measurement levels my_codebook_levels <- qlm_codebook( name = "Sentiment", instructions = "Rate the sentiment from -1 (negative) to 1 (positive).", schema = type_object( score = type_number("Sentiment score from -1 to 1"), explanation = type_string("Brief explanation") ), levels = list(score = "interval", explanation = "nominal") ) # Use with qlm_code() (requires API key) texts <- c("I love this!", "This is terrible.") coded <- qlm_code(texts, my_codebook, model = "openai/gpt-4o-mini") coded# Define a custom codebook my_codebook <- qlm_codebook( name = "Sentiment", instructions = "Rate the sentiment from -1 (negative) to 1 (positive).", schema = type_object( score = type_number("Sentiment score from -1 to 1"), explanation = type_string("Brief explanation") ) ) # With a role my_codebook_role <- qlm_codebook( name = "Sentiment", instructions = "Rate the sentiment from -1 (negative) to 1 (positive).", schema = type_object( score = type_number("Sentiment score from -1 to 1"), explanation = type_string("Brief explanation") ), role = "You are an expert sentiment analyst." ) # With explicit measurement levels my_codebook_levels <- qlm_codebook( name = "Sentiment", instructions = "Rate the sentiment from -1 (negative) to 1 (positive).", schema = type_object( score = type_number("Sentiment score from -1 to 1"), explanation = type_string("Brief explanation") ), levels = list(score = "interval", explanation = "nominal") ) # Use with qlm_code() (requires API key) texts <- c("I love this!", "This is terrible.") coded <- qlm_code(texts, my_codebook, model = "openai/gpt-4o-mini") coded
Compares two or more coded objects to assess inter-rater reliability or
agreement. For predefined-unit data (data frames or qlm_coded objects),
computes standard reliability statistics. For segmented corpora from
qlm_segment(), computes Krippendorff's alpha for unitizing (see Details).
qlm_compare( ..., by, level = NULL, tolerance = 0, ci = c("none", "analytic", "bootstrap"), bootstrap_n = 1000, by_category = FALSE )qlm_compare( ..., by, level = NULL, tolerance = 0, ci = c("none", "analytic", "bootstrap"), bootstrap_n = 1000, by_category = FALSE )
... |
Two or more data frames, |
by |
Optional. Name of the variable(s) to compare across raters (supports
both quoted and unquoted). If |
level |
Optional. Measurement level(s) for the variable(s). Can be:
Valid levels are |
tolerance |
Numeric. Tolerance for agreement with numeric data. Default is 0 (exact agreement required). Used for percent agreement calculation. |
ci |
Confidence interval method:
|
bootstrap_n |
Number of bootstrap resamples when |
by_category |
Logical. If |
The function merges the coded objects by their .id column and only includes
units that are present in all objects. Missing values in any rater will
exclude that unit from analysis.
Measurement levels and statistics:
Nominal: For unordered categories. Computes Krippendorff's alpha, Cohen's/Fleiss' kappa, and percent agreement.
Ordinal: For ordered categories. Computes Krippendorff's alpha (ordinal), weighted kappa (2 raters only), Kendall's W, Spearman's rho, and percent agreement.
Interval: For continuous data with meaningful intervals. Computes Krippendorff's alpha (interval), ICC, Pearson's r, and percent agreement.
Ratio: For continuous data with a true zero point. Computes the same measures as interval level, but Krippendorff's alpha uses the ratio-level formula which accounts for proportional differences.
Kendall's W, ICC, and percent agreement are computed using all raters simultaneously. For 3 or more raters, Spearman's rho and Pearson's r are computed as the mean of all pairwise correlations between raters.
Per-category statistics. When by_category = TRUE and level = "nominal",
the result also includes one row per category for alpha_per_value (from
Krippendorff's alpha) and kappa_per_value (per-category kappa via
dichotomisation for Cohen's, Fleiss' eq. 20–21 for Fleiss'). The marginal
count n for each category is carried in the docid column. Per-category
rows are not produced for ordinal, interval, or ratio levels.
Unitizing (segmentation) reliability
When all inputs are segmented corpora – created by qlm_segment() or
as_qlm_coded() with qlm_segment = TRUE – agreement is measured at
the character level using Krippendorff's alpha for unitizing continua
(Krippendorff, 2019, section 12.6). This accounts for segments of
unequal length and partial overlaps between coders' unitizations. The
observed and expected coincidence matrices are constructed from the
lengths of pairwise segment intersections across all observer pairs.
The output includes a docid column with per-document and overall
results. Segmented corpora must reference the same source text.
Four members of the unitizing alpha family are supported:
alpha_u_binary (|_ualpha)Computed when by is omitted.
Measures agreement on which character spans are identified as segments
versus gaps (irrelevant matter). Collapses all segment values to a
binary distinction. Use this for pure boundary agreement when segments
carry no codes (section 12.6.4, eq. 35).
alpha_u_nominal (_ualpha[nominal])Computed when by
names a docvar. Measures agreement on both boundary placement and the
value (code) assigned to each segment. This is the most comprehensive
measure: low values can reflect boundary disagreement, coding
disagreement, or both (section 12.6.3, eq. 34).
alpha_cu_nominal (_cualpha[nominal])Computed alongside
alpha_u_nominal when by is specified. Measures coding agreement
conditional on unitization, restricting the coincidence matrix to
intersections of non-gap segments only. This isolates "do the coders
agree on the codes?" from "do they agree on the boundaries?"
(section 12.6.5, eqs. 36–37).
alpha_u_per_value[k] (_(k)ualpha[nominal])Computed
alongside alpha_u_nominal when by is specified and
by_category = TRUE. Reports the reliability of each individual
value k, showing which codes are
applied reliably and which are not. Coverage (the percentage of all
k-valued matter found in valued intersections) is reported in the
docid column (section 12.6.6, eq. 38).
A qlm_comparison object (a tibble/data frame) with the following columns:
variableName of the compared variable
levelMeasurement level used
measureName of the reliability metric
valueComputed value of the metric
docidPer-row context: source document identifier and overall
indicator for unitizing comparisons; marginal (n=X) for nominal
per-category alpha rows; NA otherwise.
rater1, rater2, ...Names of the compared objects (one column per rater)
ci_lowerLower bound of confidence interval (only if ci != "none")
ci_upperUpper bound of confidence interval (only if ci != "none")
The object has class c("qlm_comparison", "tbl_df", "tbl", "data.frame") and
attributes containing metadata (raters, n, call).
Metrics by measurement level (predefined-unit comparisons):
Nominal: alpha_nominal, kappa (Cohen's/Fleiss'), percent_agreement
Ordinal: alpha_ordinal, kappa_weighted (2 raters only), w (Kendall's W), rho (Spearman's), percent_agreement
Interval/Ratio: alpha_interval/alpha_ratio, icc, r (Pearson's), percent_agreement
For unitizing measures (segmented corpora), see Details.
Confidence intervals:
ci = "analytic": Provides analytic CIs for ICC and Pearson's r only
ci = "bootstrap": Provides bootstrap CIs for all metrics via resampling
Krippendorff, K. (2019). Content Analysis: An Introduction to Its Methodology (4th ed.). Sage. doi:10.4135/9781071878781
Related workflow functions: qlm_validate() for validation of
coding against gold standards, qlm_code() for LLM coding,
as_qlm_coded() for human coding, qlm_segment() for LLM-powered
text segmentation.
Underlying reliability calculations (internal): reliability_alpha()
and reliability_alpha_u() for Krippendorff's alpha;
reliability_kappa() (Cohen) and reliability_kappa_fleiss();
reliability_kendall_w(); reliability_icc().
# Load example coded objects examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer")) # Compare two coding runs comparison <- qlm_compare( examples$example_coded_sentiment, examples$example_coded_mini, by = "sentiment", level = "nominal" ) print(comparison) # Compare specific variables with explicit levels qlm_compare( examples$example_coded_sentiment, examples$example_coded_mini, by = "sentiment" )# Load example coded objects examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer")) # Compare two coding runs comparison <- qlm_compare( examples$example_coded_sentiment, examples$example_coded_mini, by = "sentiment", level = "nominal" ) print(comparison) # Compare specific variables with explicit levels qlm_compare( examples$example_coded_sentiment, examples$example_coded_mini, by = "sentiment" )
Get or set metadata from qlm_coded, qlm_codebook, qlm_comparison, and
qlm_validation objects. Metadata is organized into three types: user,
object, and system. Only user metadata can be modified.
qlm_meta(x, field = NULL, type = c("user", "object", "system", "all")) qlm_meta(x, field = NULL) <- valueqlm_meta(x, field = NULL, type = c("user", "object", "system", "all")) qlm_meta(x, field = NULL) <- value
x |
A quallmer object ( |
field |
Optional character string specifying a single metadata field to extract or set.
If |
type |
Character string specifying the type of metadata to extract:
|
value |
For |
Metadata is stratified into three types following the quanteda convention:
User metadata (type = "user", default): User-specified descriptive information
that can be modified via qlm_meta<-(). Fields: name, notes.
Object metadata (type = "object"): Parameters and intrinsic properties set
at object creation time. Read-only. Fields vary by object type but typically include:
batch, call, chat_args, execution_args, parent, n_units, input_type.
System metadata (type = "system"): Automatically captured environment and
version information. Read-only. Fields: timestamp, ellmer_version,
quallmer_version, R_version.
For qlm_codebook objects, user metadata includes name and instructions
(the codebook instructions text), both of which can be modified.
Modification via qlm_meta<-() (assignment):
Only user metadata can be modified. For qlm_coded, qlm_comparison, and
qlm_validation objects, modifiable fields are name and notes. For
qlm_codebook objects, modifiable fields are name and instructions.
Object and system metadata are read-only and set at creation time. Attempting to modify these will produce an informative error.
qlm_meta() returns the requested metadata (a named list or single value).
qlm_meta<-() returns the modified object (invisibly).
accessors for an overview of the accessor function system
codebook() for extracting the codebook component
inputs() for extracting input data
# Load example objects examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer")) coded <- examples$example_coded_sentiment # User metadata (default) qlm_meta(coded) qlm_meta(coded, "name") # Object metadata qlm_meta(coded, type = "object") qlm_meta(coded, "call", type = "object") qlm_meta(coded, "n_units", type = "object") # System metadata qlm_meta(coded, type = "system") qlm_meta(coded, "timestamp", type = "system") # All metadata qlm_meta(coded, type = "all") # Modify user metadata qlm_meta(coded, "name") <- "updated_run" qlm_meta(coded, "notes") <- "Analysis notes" # Set multiple fields at once qlm_meta(coded) <- list(name = "final_run", notes = "Final analysis") ## Not run: # This will error - object and system metadata are read-only qlm_meta(coded, "timestamp") <- Sys.time() ## End(Not run)# Load example objects examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer")) coded <- examples$example_coded_sentiment # User metadata (default) qlm_meta(coded) qlm_meta(coded, "name") # Object metadata qlm_meta(coded, type = "object") qlm_meta(coded, "call", type = "object") qlm_meta(coded, "n_units", type = "object") # System metadata qlm_meta(coded, type = "system") qlm_meta(coded, "timestamp", type = "system") # All metadata qlm_meta(coded, type = "all") # Modify user metadata qlm_meta(coded, "name") <- "updated_run" qlm_meta(coded, "notes") <- "Analysis notes" # Set multiple fields at once qlm_meta(coded) <- list(name = "final_run", notes = "Final analysis") ## Not run: # This will error - object and system metadata are read-only qlm_meta(coded, "timestamp") <- Sys.time() ## End(Not run)
Re-executes a coding task from a qlm_coded object, optionally with
modified settings. If no overrides are provided, uses identical settings
to the original coding.
qlm_replicate( x, ..., codebook = NULL, model = NULL, batch = NULL, name = NULL, notes = NULL )qlm_replicate( x, ..., codebook = NULL, model = NULL, batch = NULL, name = NULL, notes = NULL )
x |
A |
... |
Optional overrides passed to |
codebook |
Optional replacement codebook. If |
model |
Optional replacement model (e.g., |
batch |
Optional logical to override batch processing setting. If |
name |
Optional name for this run. If |
notes |
Optional character string with descriptive notes about this
replication. Useful for documenting why this replication was run or what
differs from the original. Default is |
A qlm_coded object with run$parent set to the parent's run name.
qlm_code() for initial coding, qlm_compare() for comparing
replicated results.
# First create a coded object texts <- c("I love this!", "Terrible.", "It's okay.") coded <- qlm_code(texts, data_codebook_sentiment, model = "openai/gpt-4o-mini", name = "run1") # Replicate with same model coded2 <- qlm_replicate(coded, name = "run2") # Compare results qlm_compare(coded, coded2, by = "sentiment", level = "nominal")# First create a coded object texts <- c("I love this!", "Terrible.", "It's okay.") coded <- qlm_code(texts, data_codebook_sentiment, model = "openai/gpt-4o-mini", name = "run1") # Replicate with same model coded2 <- qlm_replicate(coded, name = "run2") # Compare results qlm_compare(coded, coded2, by = "sentiment", level = "nominal")
Applies a codebook to input texts to segment them into thematic or conceptual
units, returning a quanteda::corpus() where each segment is a document.
This is the LLM-powered analogue of quanteda::corpus_segment().
qlm_segment(x, codebook, model, ..., name = NULL, notes = NULL)qlm_segment(x, codebook, model, ..., name = NULL, notes = NULL)
x |
A character vector of texts or a |
codebook |
A codebook object created with |
model |
Provider (and optionally model) name in the form
|
... |
Additional arguments passed to |
name |
Character string identifying this coding run. Default is |
notes |
Optional character string with descriptive notes about this
segmentation run. Default is |
The codebook schema defines additional document-level variables (docvars)
for each segment. A text field (the verbatim segment text) is always added
automatically and must not appear in the schema. Measurement levels defined
in the codebook are not applicable to segmentation and are silently ignored.
A quanteda::corpus() where each segment is a document. Document
names follow the {source}.{i} convention of quanteda::corpus_segment().
Docvars include:
docidName of the source document.
segidInteger segment index within the source document.
Any fields defined in the codebook schema.
Original docvars inherited from the input (if x is a
corpus).
qlm_code() for document-level coding, qlm_codebook() for
creating codebooks, quanteda::corpus_segment() for pattern-based
segmentation.
## Not run: # Aspect-based segmentation of a hotel review (character vector input # returns a data.frame). review <- paste( "The room was clean and tidy, despite being rather basic in its furnishings.", "The location of the hotel was really great, however.", "We loved the proximity to both public transport and to the city's main attractions." ) cb_absa <- qlm_codebook( name = "Aspect-based segmentation", instructions = paste( "Segment the text according to the distinct aspects (topics or features).", "Each segment will continue as long as it is part of the same aspect.", "An aspect-based segment may be more than one sentence or may be just a", "part of a sentence.", "", "Aspects in hotel reviews include: cleanliness, features, location, service,", "and value. Return each aspect segment with its verbatim text and a short", "aspect label." ), schema = type_object( aspect = type_string("Short aspect label"), sentiment = type_enum(c("negative", "neutral", "positive"), "Sentiment toward this aspect") ) ) segs <- qlm_segment(review, cb_absa, model = "anthropic") quanteda::docvars(segs) # docid segid aspect sentiment # 1 text1 1 cleanliness positive # 2 text1 2 features negative # 3 text1 3 location positive # Corpus input preserves existing docvars reviews_corp <- quanteda::corpus( c(hotel_a = review), docvars = data.frame(city = "London", stars = 4L) ) segs_corp <- qlm_segment(reviews_corp, cb_absa, model = "anthropic") quanteda::docvars(segs_corp) ## End(Not run)## Not run: # Aspect-based segmentation of a hotel review (character vector input # returns a data.frame). review <- paste( "The room was clean and tidy, despite being rather basic in its furnishings.", "The location of the hotel was really great, however.", "We loved the proximity to both public transport and to the city's main attractions." ) cb_absa <- qlm_codebook( name = "Aspect-based segmentation", instructions = paste( "Segment the text according to the distinct aspects (topics or features).", "Each segment will continue as long as it is part of the same aspect.", "An aspect-based segment may be more than one sentence or may be just a", "part of a sentence.", "", "Aspects in hotel reviews include: cleanliness, features, location, service,", "and value. Return each aspect segment with its verbatim text and a short", "aspect label." ), schema = type_object( aspect = type_string("Short aspect label"), sentiment = type_enum(c("negative", "neutral", "positive"), "Sentiment toward this aspect") ) ) segs <- qlm_segment(review, cb_absa, model = "anthropic") quanteda::docvars(segs) # docid segid aspect sentiment # 1 text1 1 cleanliness positive # 2 text1 2 features negative # 3 text1 3 location positive # Corpus input preserves existing docvars reviews_corp <- quanteda::corpus( c(hotel_a = review), docvars = data.frame(city = "London", stars = 4L) ) segs_corp <- qlm_segment(reviews_corp, cb_absa, model = "anthropic") quanteda::docvars(segs_corp) ## End(Not run)
Creates a complete audit trail documenting your qualitative coding workflow. Following Lincoln and Guba's (1985) concept of the audit trail for establishing trustworthiness in qualitative research, this function captures the full decision history of your AI-assisted coding process.
qlm_trail(..., path = NULL)qlm_trail(..., path = NULL)
... |
One or more quallmer objects ( |
path |
Optional base path for saving the audit trail. When provided,
creates |
Lincoln and Guba (1985, pp. 319-320) describe six categories of audit trail materials for establishing trustworthiness in qualitative research. The quallmer package operationalizes these for LLM-assisted text analysis:
Original texts stored in coded objects
Coded results from each run
Comparisons and validations
Model parameters, timestamps, decision history
Function calls documenting intent
Codebook with instructions and schema
When path is provided, the function creates:
{path}.rds: Complete trail object for R (reloadable with readRDS())
{path}.qmd: Quarto document with full audit trail documentation
A qlm_trail object containing:
List of run information with coded data, ordered from oldest to newest
Logical indicating whether all parent references were resolved
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic Inquiry. Sage.
qlm_code(), qlm_replicate(), qlm_compare(), qlm_validate()
# Load example coded objects examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer")) # View audit trail from two coding runs trail <- qlm_trail( examples$example_coded_sentiment, examples$example_coded_mini ) print(trail) # Save complete audit trail (creates .rds and .qmd files) qlm_trail( examples$example_coded_sentiment, examples$example_coded_mini, path = tempfile("my_analysis") )# Load example coded objects examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer")) # View audit trail from two coding runs trail <- qlm_trail( examples$example_coded_sentiment, examples$example_coded_mini ) print(trail) # Save complete audit trail (creates .rds and .qmd files) qlm_trail( examples$example_coded_sentiment, examples$example_coded_mini, path = tempfile("my_analysis") )
Validates LLM-coded results from one or more qlm_coded objects against a
gold standard (typically human annotations) using appropriate metrics based
on measurement level. For nominal data, computes accuracy, precision, recall,
F1-score, and Cohen's kappa. For ordinal data, computes accuracy and weighted
kappa (linear weighting), which accounts for the ordering and distance between
categories.
qlm_validate( ..., gold, by, level = NULL, average = c("macro", "micro", "weighted", "none"), ci = c("none", "analytic", "bootstrap"), bootstrap_n = 1000 )qlm_validate( ..., gold, by, level = NULL, average = c("macro", "micro", "weighted", "none"), ci = c("none", "analytic", "bootstrap"), bootstrap_n = 1000 )
... |
One or more data frames, |
gold |
A data frame, |
by |
Optional. Name of the variable(s) to validate (supports both quoted
and unquoted). If |
level |
Optional. Measurement level(s) for the variable(s). Can be:
Valid levels are |
average |
Character scalar. Averaging method for multiclass metrics (nominal level only):
|
ci |
Confidence interval method:
|
bootstrap_n |
Number of bootstrap resamples when |
The function performs an inner join between x and gold using the .id
column, so only units present in both datasets are included in validation.
Missing values (NA) in either predictions or gold standard are excluded with
a warning.
Measurement levels:
Nominal: Categories with no inherent ordering (e.g., topics, sentiment polarity). Metrics: accuracy, precision, recall, F1-score, Cohen's kappa (unweighted).
Ordinal: Categories with meaningful ordering but unequal intervals
(e.g., ratings 1-5, Likert scales). Metrics: Spearman's rho (rho, rank
correlation), Kendall's tau (tau, rank correlation), and MAE (mae, mean
absolute error). These measures account for the ordering of categories
without assuming equal intervals.
Interval/Ratio: Numeric data with equal intervals (e.g., counts, continuous measurements). Metrics: ICC (intraclass correlation), Pearson's r (linear correlation), MAE (mean absolute error), and RMSE (root mean squared error).
For multiclass problems with nominal data, the average parameter controls
how per-class metrics are aggregated:
Macro averaging computes metrics for each class independently and takes the unweighted mean. This treats all classes equally regardless of size.
Micro averaging aggregates all true positives, false positives, and false negatives globally before computing metrics. This weights classes by their prevalence.
Weighted averaging computes metrics for each class and takes the mean weighted by class size.
No averaging (average = "none") returns global macro-averaged metrics
plus per-class breakdown.
Note: The average parameter only affects precision, recall, and F1 for
nominal data. For ordinal data, these metrics are not computed.
A qlm_validation object (a tibble/data frame) with the following columns:
variableName of the validated variable
levelMeasurement level used
measureName of the validation metric
valueComputed value of the metric
classFor nominal data: averaging method used (e.g., "macro", "micro",
"weighted") or class label (when average = "none"). For ordinal/interval
data: NA (averaging not applicable).
raterName of the object being validated (from input names)
ci_lowerLower bound of confidence interval (only if ci != "none")
ci_upperUpper bound of confidence interval (only if ci != "none")
The object has class c("qlm_validation", "tbl_df", "tbl", "data.frame") and
attributes containing metadata (n, call).
Metrics computed by measurement level:
Nominal: accuracy, precision, recall, f1, kappa
Ordinal: rho (Spearman's), tau (Kendall's), mae
Interval: icc, r (Pearson's), mae, rmse
Confidence intervals:
ci = "analytic": Provides analytic CIs for ICC and Pearson's r only
ci = "bootstrap": Provides bootstrap CIs for all metrics via resampling
Precision, recall, and F-measure (confusion-matrix definitions and micro / macro averaging): Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437. doi:10.1016/j.ipm.2009.03.002
Macro F-measure as the arithmetic mean of per-class F-scores (the convention used here, matching yardstick and scikit-learn): Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval, Chapter 13. Cambridge University Press. Free online: https://nlp.stanford.edu/IR-book/
Cohen's kappa: Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. doi:10.1177/001316446002000104
Intraclass correlation coefficient: Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428. doi:10.1037/0033-2909.86.2.420
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30-46. doi:10.1037/1082-989X.1.1.30
Related workflow functions: qlm_compare() for inter-rater
reliability between coded objects, qlm_code() for LLM coding,
as_qlm_coded() for converting human-coded data.
Underlying classification metrics (internal):
metric_precision(), metric_recall(), metric_f_meas();
Cohen's kappa is computed via reliability_kappa() and the ICC
via reliability_icc().
# Load example coded objects examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer")) # Validate against gold standard (auto-detected) validation <- qlm_validate( examples$example_coded_mini, examples$example_gold_standard, by = "sentiment", level = "nominal" ) print(validation) # Explicit gold parameter (backward compatible) validation2 <- qlm_validate( examples$example_coded_mini, gold = examples$example_gold_standard, by = "sentiment", level = "nominal" ) print(validation2)# Load example coded objects examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer")) # Validate against gold standard (auto-detected) validation <- qlm_validate( examples$example_coded_mini, examples$example_gold_standard, by = "sentiment", level = "nominal" ) print(validation) # Explicit gold parameter (backward compatible) validation2 <- qlm_validate( examples$example_coded_mini, gold = examples$example_gold_standard, by = "sentiment", level = "nominal" ) print(validation2)