| Title: | Regularized Survey Raking |
|---|---|
| Description: | Calibrates survey weights to known population targets using regularized raking. Constraints are specified with a formula interface (for example, rr_exact(), rr_l2(), rr_range(), rr_mean(), rr_var(), and rr_quantile()). Supports common target formats including autumn-style proportions tables, raw or weighted population microdata, named-list targets (as in 'anesrake'), and 'survey' package design objects. Optimization follows Barratt et al. (2021) <https://web.stanford.edu/~boyd/papers/pdf/optimal_representative_sampling.pdf> and returns calibrated weights with balance and convergence diagnostics. |
| Authors: | Andy Timm [aut, cre, cph] |
| Maintainer: | Andy Timm <[email protected]> |
| License: | Apache License (== 2.0) | file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-14 09:05:04 UTC |
| Source: | https://github.com/andytimm/regrake |
Displays a human-readable representation of a raking formula. This function shows each term in the formula in a structured format, making it easy to understand complex formulas with multiple constraints.
## S3 method for class 'raking_formula' print(x, ...)## S3 method for class 'raking_formula' print(x, ...)
x |
A raking_formula object |
... |
Additional arguments passed to other methods |
Invisibly returns the object
Print method for regrake objects
## S3 method for class 'regrake' print(x, ...)## S3 method for class 'regrake' print(x, ...)
x |
A regrake object |
... |
Additional arguments (ignored) |
Invisibly returns the object
Selects top-k weights and assigns them equal weight 1/k. All other weights are set to zero. This is used for representative sample selection where exactly k samples should be selected.
prox_boolean_reg(w, lam, k)prox_boolean_reg(w, lam, k)
w |
Input vector of weights |
lam |
Regularization parameter (unused, kept for interface consistency) |
k |
Number of samples to select |
Vector with k non-zero entries, each equal to 1/k
Proximal operator for equality constraints
prox_equality(x, target, rho)prox_equality(x, target, rho)
x |
Input vector |
target |
Target vector |
rho |
Proximal parameter (unused for equality constraints) |
Projected vector equal to target
Proximal operator for equality regularizer
prox_equality_reg(w, lam)prox_equality_reg(w, lam)
w |
Input vector |
lam |
Regularization parameter |
Original vector (identity operation)
Proximal operator for inequality constraints
prox_inequality(x, target, rho, lower, upper)prox_inequality(x, target, rho, lower, upper)
x |
Input vector |
target |
Target vector (used for offset) |
rho |
Proximal parameter (unused for inequality constraints) |
lower |
Lower bound |
upper |
Upper bound |
Clipped vector within bounds relative to target
Proximal operator for KL divergence loss
prox_kl(x, target, rho, scale = 0.5)prox_kl(x, target, rho, scale = 0.5)
x |
Input vector |
target |
Target vector |
rho |
Proximal parameter |
scale |
Scale factor for KL divergence (default 0.5, matching the Python reference) |
Updated vector minimizing KL divergence plus proximal term
Proximal operator for KL regularizer
prox_kl_reg(w, lam, prior = NULL, limit = NULL)prox_kl_reg(w, lam, prior = NULL, limit = NULL)
w |
Input vector |
lam |
Regularization parameter |
prior |
Prior weights (default uniform) |
limit |
Optional upper bound on weight magnitudes |
Updated vector minimizing KL divergence plus proximal term
Proximal operator for least squares loss
prox_least_squares(x, target, tau, diag_weight = 1)prox_least_squares(x, target, tau, diag_weight = 1)
x |
Input vector |
target |
Target vector |
tau |
Proximal parameter (1/rho) |
diag_weight |
Numeric scalar or vector of weights for each element (default 1) |
Updated vector minimizing weighted quadratic plus proximal term
Proximal operator for sum squares regularizer
prox_sum_squares_reg(w, lam)prox_sum_squares_reg(w, lam)
w |
Input vector |
lam |
Regularization parameter |
Updated vector minimizing sum squares plus proximal term
Optimal representative sample weighting
regrake( data, formula, population_data, pop_type = c("raw", "weighted", "proportions", "anesrake", "survey", "survey_design"), pop_weights = NULL, regularizer = "entropy", lambda = 1, prior = NULL, k = NULL, bounds = c(0.1, 10), bounds_method = c("soft", "hard"), exact_tol = NULL, normalize = TRUE, control = list(), verbose = FALSE, ... )regrake( data, formula, population_data, pop_type = c("raw", "weighted", "proportions", "anesrake", "survey", "survey_design"), pop_weights = NULL, regularizer = "entropy", lambda = 1, prior = NULL, k = NULL, bounds = c(0.1, 10), bounds_method = c("soft", "hard"), exact_tol = NULL, normalize = TRUE, control = list(), verbose = FALSE, ... )
data |
A data.frame or tibble containing the sample data |
formula |
A formula specifying the raking constraints (e.g., |
population_data |
Population data: a data.frame, list, or survey.design object (see |
pop_type |
How population data is specified:
|
pop_weights |
Column name in population_data containing weights (if pop_type = "weighted") |
regularizer |
Regularization method ("entropy", "zero", "kl", or "boolean") |
lambda |
Regularization strength (default = 1) |
prior |
Optional prior weights used when |
k |
Number of samples to select (required for regularizer = "boolean") |
bounds |
Numeric vector of length 2 specifying (min, max) allowed weight values.
Weights returned sum to n (sample size), so |
bounds_method |
How to enforce bounds:
|
exact_tol |
Optional tolerance for exact constraints. When non-NULL, all
|
normalize |
Logical. If TRUE (default), continuous variables are automatically scaled by their target value for numerical stability. The achieved values are reported in original units. Set to FALSE to disable this behavior. |
control |
List of control parameters for the ADMM solver:
|
verbose |
Whether to print progress information |
... |
Additional arguments passed to methods |
An object of class "regrake" containing:
weights |
The optimal weights (sum to n) |
balance |
Data frame comparing achieved vs target values with columns: constraint (e.g., "exact_sex"), type ("exact" or "l2"), variable, level, achieved, target, residual |
solution |
Full solution details from solver |
diagnostics |
Weight, convergence, and margin matching diagnostics |
set.seed(42) sample_data <- data.frame( sex = sample(c("M", "F"), 200, replace = TRUE, prob = c(0.6, 0.4)), age = sample(c("young", "old"), 200, replace = TRUE, prob = c(0.7, 0.3)) ) pop_targets <- data.frame( variable = c("sex", "sex", "age", "age"), level = c("M", "F", "young", "old"), target = c(0.49, 0.51, 0.45, 0.55) ) result <- regrake( data = sample_data, formula = ~ rr_exact(sex) + rr_exact(age), population_data = pop_targets, pop_type = "proportions" ) result result$balanceset.seed(42) sample_data <- data.frame( sex = sample(c("M", "F"), 200, replace = TRUE, prob = c(0.6, 0.4)), age = sample(c("young", "old"), 200, replace = TRUE, prob = c(0.7, 0.3)) ) pop_targets <- data.frame( variable = c("sex", "sex", "age", "age"), level = c("M", "F", "young", "old"), target = c(0.49, 0.51, 0.45, 0.55) ) result <- regrake( data = sample_data, formula = ~ rr_exact(sex) + rr_exact(age), population_data = pop_targets, pop_type = "proportions" ) result result$balance
These functions specify constraint types for raking formulas. They are used
within formula specifications passed to regrake().
rr_l2(x) rr_kl(x) rr_exact(x) rr_mean(x) rr_var(x) rr_quantile(x, p) rr_range(x, ...) rr_between(x, ...)rr_l2(x) rr_kl(x) rr_exact(x) rr_mean(x) rr_var(x) rr_quantile(x, p) rr_range(x, ...) rr_between(x, ...)
x |
Variable name (unquoted) to apply the constraint to |
p |
For |
... |
For |
rr_exact(): Exact equality constraint (weighted sum equals target exactly)
rr_l2(): Soft L2/least squares constraint (penalizes deviation from target)
rr_kl(): KL divergence constraint
rr_mean(): Match the mean of a continuous variable (alias for rr_exact on continuous)
rr_var(): Match the variance of a continuous variable
rr_quantile(): Match a specific quantile of a continuous variable
The input variable (these functions are markers for the formula parser)
# Match sex proportions exactly, age proportions with soft constraint formula <- ~ rr_exact(sex) + rr_l2(age) # Match categorical variable and continuous mean formula <- ~ rr_exact(region) + rr_mean(income) # Match mean and variance of a continuous variable formula <- ~ rr_mean(income) + rr_var(income) # Match median income formula <- ~ rr_quantile(income, 0.5)# Match sex proportions exactly, age proportions with soft constraint formula <- ~ rr_exact(sex) + rr_l2(age) # Match categorical variable and continuous mean formula <- ~ rr_exact(region) + rr_mean(income) # Match mean and variance of a continuous variable formula <- ~ rr_mean(income) + rr_var(income) # Match median income formula <- ~ rr_quantile(income, 0.5)
Summary method for regrake objects
## S3 method for class 'regrake' summary(object, ...)## S3 method for class 'regrake' summary(object, ...)
object |
A regrake object |
... |
Additional arguments (ignored) |
Invisibly returns the object