Introduction

Generalized Linear Models (GLMs) are widely used for analyzing multivariate data with non-normal responses. The Iteratively Reweighted Least Squares (IRLS) algorithm is the main workhorse for fitting these models in small to medium-scale problems because it reduces the estimation problem to a series of Weighted Least Squares (WLS) subproblems, which are computationally less expensive than solving the full maximum likelihood optimization directly.

However, IRLS can be sensitive to convergence issues—especially when using the standard Ordinary Least Squares (OLS) update in each iteration. In cases of model mis-specification or high variability in the data, the OLS estimator may yield significant estimation error. The glm2 package improves upon the standard glm by incorporating a step-halving strategy to better handle convergence challenges. This robust framework is implemented in its glm.fit2 function.

In the savvyGLM package, we build on the robust framework provided by the glm2 package (and its function glm.fit2) by replacing the standard OLS update in the IRLS algorithm with a shrinkage estimator. The rationale is that shrinkage estimation introduces a small bias to substantially reduce the Mean Squared Error (MSE) of the OLS estimates, resulting in more reliable parameter estimates and enhanced convergence stability. The corresponding shrinkage paper can be find in Slab and Shrinkage Linear Regression Estimation.

Simulation studies and real-data applications demonstrate that replacing the OLS update in IRLS with a shrinkage estimator improves the convergence and overall performance of GLM estimation compared to the traditional non-penalized implementation.

In summary, our package modifies glm.fit2 from glm2 to:

Leverage robust features, such as step-halving, already present in glm2.
Replace the OLS update with one of several shrinkage estimators (e.g., St_ost, DSh_ost, SR_ost, GSR_ost, and optionally Sh_ost).
Employ parallelization when multiple candidate methods are specified.

Main Features

savvyGLM provides five classes of shrinkage methods for GLMs, although by default four are evaluated unless the user explicitly includes Sh(since Sh is time-consuming):

Custom Optimization Methods: Implements shrinkage estimators (St_ost, DSh_ost, SR_ost, GSR_ost, and optionally Sh_ost) that replace the standard OLS update in IRLS.
Flexible Model Choice: The model_class argument allows users to choose one or more shrinkage methods. By default, only "St", "DSh", "SR", and "GSR" are evaluated. Including "Sh" is optional because solving the Sylvester equation required by Sh_ost is computationally intensive.

Parallel Evaluation: When multiple candidate methods are specified, the function evaluates them in parallel (if use_parallel = TRUE) and selects the final model based on the lowest Akaike Information Criterion (AIC).

Key Argument and Output Summary

Argument	Description	Default
`x`	A numeric matrix of predictors. As for `glm.fit`.	No default; user must supply.
`y`	A numeric vector of responses. As for `glm.fit`.	No default; user must supply.
`weights`	An optional vector of weights to be used in the fitting process.	`rep(1, nobs)`
`model_class`	A character vector specifying the shrinkage model(s) to be used. Allowed values are `"St"`, `"DSh"`, `"SR"`, `"GSR"`, and `"Sh"`. If multiple values are given, they are run in parallel and the best is chosen by AIC. If `"Sh"` is not included, it is omitted from the evaluation.	`c("St", "DSh", "SR", "GSR")`
`start`, `etastart`, `mustart`	Optional starting values for the parameters, the linear predictor, and the mean, respectively.	`NULL`
`offset`	An optional offset to be included in the model.	`rep(0, nobs)`
`family`	A function specifying the conditional distribution of \(Y \mid X\), e.g., \(Y \mid X \sim \text{family}(g^{-1}(X\beta))\). (e.g., `gaussian()`, `binomial()`, `poisson()`).	`gaussian()`
`control`	A list of parameters for controlling the fitting process (e.g., `maxit`, `epsilon`). Similar to `glm.control`.	`list()`
`intercept`	A logical value indicating whether an intercept should be included in the model.	`TRUE`
`use_parallel`	Logical. If `TRUE`, multiple methods in `model_class` are evaluated in parallel. If only one method is specified, it is run directly. The best model is chosen by the lowest AIC among the methods tested.	`FALSE`
Key Output	Description	Notes
`chosen_fit`	The name of the shrinkage method selected based on the lowest AIC. If only one method is specified in `model_class`, then `chosen_fit` will simply return that method’s name.	Returned in the final model object.
`coefficients`, `residuals`, etc.	The usual GLM components, similar to those returned by `glm.fit`.	See documentation for full details.

This vignette provides an overview of the methods implemented in savvyGLM and demonstrates how to use them on simulated data and real data. We begin with a theoretical overview of each shrinkage class and GLM with IRLS, then walk through simulation examples that illustrate how to apply savvyGLM for each method.

Theoretical Overview

Shrinkage estimators

The underlying optimization methods in savvyGLM are designed to reduce the theoretical mean square error by introducing a small bias that yields a substantial reduction in variance. This trade-off leads to more reliable parameter estimates and improved convergence properties in Generalized Linear Models.

Here, we just provide a table summary since for a detailed theoretical overview of the shrinkage estimators applied to GLMs, please refer to the savvySh package, which covers many of the same shrinkage estimators used here, but in the context of linear regression. The methods in savvyGLM extend these ideas to generalized linear models, ensuring that the IRLS updates incorporate shrinkage at each iteration.

Method	Shrinkage Approach	Simple Description	Extra Overhead?	Included by Default?
OLS	No shrinkage	Baseline estimator without any regularization.	No	No (baseline method)
St	Scalar shrinkage	Scales all OLS coefficients by a common factor to reduce MSE.	No	Yes
DSh	Diagonal matrix	Shrinks each OLS coefficient separately using individual factors.	No	Yes
SR	Slab penalty (1 direction)	Adds a penalty that shrinks the estimate along a fixed direction (e.g., all-ones vector).	No	Yes
GSR	Slab penalty (multiple directions)	Generalizes SR by penalizing multiple directions, often from eigenvectors.	No	Yes
LW	Linear shrinkage	Shrinks towards a structured target matrix (equal variances, zero covariances) to handle large-dimensional data.	No	Yes
QIS	Nonlinear shrinkage	A quadratic shrinkage estimator optimized under various loss functions, highly effective for large covariance matrices.	No	Yes
Sh	Full matrix (Sylvester equation)	Applies a matrix transformation to OLS using a solution to the Sylvester equation.	Yes (computational)	No (only if specified in `model_class`)

Generalized Linear Model and its IRLS Implementation

A GLM assumes that a response variable \(Y\), defined on some set \(\mathcal{Y} \subseteq \Re\), is related to a set of covariates \(\mathbf{X}\) in \(\mathcal{X} \subseteq \Re{^d}\). The conditional distribution of \(Y\) belongs to the exponential dispersion family, with probability density or mass function

\[ f_Y(y; \theta, \phi) \;=\; \exp\left\{\frac{\theta\,y - b(\theta)}{a(\phi)} \;+\; c(y, \phi)\right\}, \]

where \(\theta\) is the canonical parameter, \(\phi\) is the dispersion parameter, and \(a\), \(b\), \(c\) are known functions. The mean of \(Y\) is linked to a linear predictor \(\eta_i = \mathbf{x}_i^\top \boldsymbol{\beta}\) through a link function \(g\), so that

\[ \mathbb{E}[Y_i \mid \mathbf{X}_i = \mathbf{x}_i] \;=\; h\left(\mathbf{x}_i^\top \boldsymbol{\beta}\right), \]

where \(h = g^{-1}\) is the inverse of the link. Under a canonical link, \(h(\cdot) = b^\prime(\cdot)\).

For an independent sample \(\left\{(y_i, \mathbf{x}_i)\right\}_{i=1}^n\), the log-likelihood for \(\boldsymbol{\beta}\) can be written as

\[ \ell(\boldsymbol{\beta}) \;=\; \sum_{i=1}^n \frac{\theta_i\,y_i \;-\; b(\theta_i)}{a(\phi)} \;+\; c\left(y_i, \phi\right), \]

where \(\theta_i = (b^\prime)^{-1}\circ\, h\left(\mathbf{x}_i^\top \boldsymbol{\beta}\right)\). Maximizing this log-likelihood is equivalent to minimizing the following objective function,

\[ \mathcal{C}(\boldsymbol{\beta}) \;=\; -\sum_{i=1}^n \left(\theta_i\,y_i \;-\; b(\theta_i)\right). \]

Taking the derivative of \(\mathcal{C}(\boldsymbol{\beta})\) with respect to \(\boldsymbol{\beta}\) leads to the so-called normal equations, which can be solved iteratively. In particular, the IRLS algorithm reformulates each iteration as a WLS problem:

\[ \widehat{\boldsymbol{\beta}}^{(t+1)} \;=\; \underset{\boldsymbol{\beta}}{\mathrm{arg\,min}} \;\left(\mathbf{z}^{(t)} - \mathbf{X}\,\boldsymbol{\beta}\right)^\top \mathbf{W}^{(t)} \left(\mathbf{z}^{(t)} - \mathbf{X}\,\boldsymbol{\beta}\right), \]

where \(\mathbf{W}^{(t)}\) is the diagonal matrix of weights and \(\mathbf{z}^{(t)}\) is the pseudo-response at iteration \(t\). Specifically,

\[ \mathbf{W}^{(t)} \;=\; \mathrm{diag}\left(\left(h^\prime(\eta_i^{(t)})\right)^2 / V\left(\mu_i^{(t)}\right)\right), \quad \mathbf{z}_i^{(t)} \;=\; \eta_i^{(t)} \;+\; \frac{y_i - \mu_i^{(t)}}{h^\prime(\eta_i^{(t)})}, \]

with \(\mu_i^{(t)} = h(\eta_i^{(t)})\) and \(\eta_i^{(t)} = \mathbf{x}_i^\top \widehat{\boldsymbol{\beta}}^{(t)}\). Here, \(V(\mu_i)\) is the variance function determined by the exponential family, and \(h^\prime\) is the derivative of the inverse link.

By iterating this WLS problem until convergence (or until a maximum number of iterations is reached), IRLS provides the maximum likelihood estimates for the GLM parameters. In savvyGLM, these WLS updates are further enhanced by applying shrinkage estimators (St_ost, DSh_ost, SR_ost, GSR_ost}, orSh_ost`) at each iteration, thereby improving estimation accuracy and convergence stability.

Simulation Examples

This section presents simulation studies to compare the performance of the shrinkage estimators implemented in the savvyGLM package. We compare the standard IRLS update (as implemented in glm.fit2) with various shrinkage methods implemented in savvyGLM by measuring the \(L_2\) distance between the estimated and true coefficients. Lower \(L_2\) values indicate better estimation accuracy.

The following data-generating processes (DGPs) are considered for logistic regression (LR), Poisson, and Gamma GLMs with log link and sqrt link:

Covariate Matrix: The covariate matrix \(\mathbf{X} \in \Re^{n \times p}\) is drawn from a multivariate normal distribution with mean zero, unit variances, and a Toeplitz covariance matrix \(\Sigma\) where \(\text{Cov}(X_{i,j}, X_{k,j}) = \rho^{|i-k|}\) with \(\rho \in (-1,1)\).
True Coefficients: The true regression coefficients \(\beta_j\) are set to alternate in sign and increase in magnitude (e.g., \(1, -1, 2, -2, \dots\) for LR with the logit link). For Poisson and Gamma GLMs with the log link, a smaller scaling is used to avoid numerical issues.
Response Generation:
- LR: Compute the linear predictor: \(\eta_i = \beta_0 + \sum_{j=1}^p \beta_j x_{ij}\) and generate \(Y_i \sim \text{Binomial}\left(1, \frac{1}{1+e^{-\eta_i}}\right)\).
- Poisson GLM:Two link functions are considered:For the log link: \(\mu_i = e^{\eta_i}\); for the sqrt link: \(\mu_i = \eta_i^2\). Then, generate \(Y_i \sim \text{Poisson}(\mu_i)\).
- Gamma GLM: Similarly, use \(\mu_i = e^{\eta_i}\) (log link) or \(\mu_i = \eta_i^2\) (sqrt link), and generate \(Y_i\) from a Gamma distribution with mean \(\mu_i\).

For each scenario, we fit the GLM using the standard IRLS update (i.e., OLS-based) and the shrinkage-based IRLS update implemented in savvyGLM. The performance is assessed using the \(L_2\) distance between the estimated and true coefficients. Tables comparing these distances and other relevant diagnostics are provided to illustrate the benefits of the shrinkage approach. These simulation studies aims to demonstrate that replacing the OLS update with shrinkage estimators in the IRLS algorithm improves convergence and estimation accuracy, especially in challenging settings with multicollinearity or extreme response values.

Note: Due to differences in underlying libraries and random number generation implementations (e.g., BLAS/LAPACK and RNG behavior), the data generated by functions like mvrnorm may differ slightly between Windows and macOS/Linux, even with the same seed.

Logistic Regression

Balanced Data

In this simulation, balanced binary responses are generated using the logit link. Covariates are sampled from a multivariate normal distribution with a Toeplitz covariance matrix determined by various correlation coefficients. To ensure a balanced dataset, a larger dataset is first generated, and then a subset is selected to achieve roughly equal proportions of 0’s and 1’s. We fit the models using the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
#> 
#> Attaching package: 'glm2'
#> The following object is masked from 'package:MASS':
#> 
#>     crabs
library(CVXR)
#> 
#> Attaching package: 'CVXR'
#> The following object is masked from 'package:MASS':
#> 
#>     huber
#> The following objects are masked from 'package:stats':
#> 
#>     power, sd, var
#> The following objects are masked from 'package:base':
#> 
#>     diag, norm, outer
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
target_proportion <- 0.5
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- binomial(link = "logit")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "LW", "QIS", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  n_val_large <- n_val * 10
  
  X_large <- mvrnorm(n_val_large, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_large_intercept <- cbind(1, X_large)
  beta_true <- theta_func(p_val + 1)
  mu_y_large <- as.vector(1 / (1 + exp(-X_large_intercept %*% beta_true)))
  y_large <- rbinom(n = n_val_large, size = 1, prob = mu_y_large)

  y_zero_indices <- which(y_large == 0)[1:round(n_val * target_proportion)]
  y_one_indices <- which(y_large == 1)[1:(n_val - round(n_val * target_proportion))]
  final_indices <- c(y_zero_indices, y_one_indices)
  X_final <- X_large_intercept[final_indices, ]
  y_final <- y_large[final_indices]
  
  fit_ols <- glm.fit2(X_final, y_final, control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_final, y_final, model_class = m, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different \(\rho\) values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (Balanced LR)")

L2 Distance Between Estimated and True Coefficients (Balanced LR)
	rho=-0.75	rho=-0.5	rho=0	rho=0.5	rho=0.75
OLS	118.4198	138.5361	414.2164	5084.7451	26.1614
SR	117.8333	138.5864	413.9950	5082.0885	26.1597
GSR	117.7572	138.3772	413.5489	5235.2437	23.1458
St	119.5847	136.3671	412.7503	5068.1288	24.5283
DSh	121.0566	136.6320	412.7369	5069.2734	23.6872
LW	35.5669	36.9323	38.5387	36.5738	34.2470
QIS	25.2448	33.1878	146.1587	2284.2926	18.4463
Sh	12.1582	10.0166	73.5201	1190.1148	22.7176

Imbalanced Data

For imbalanced LR, the simulation settings are similar except that the target proportion for the negative class is reduced (for example, only 5% of the responses are 0’s). This results in a skewed binary response, which allows us to evaluate the robustness of the shrinkage estimators under class imbalance. As before, models are fit using both the standard GLM and the various shrinkage-based GLM.

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
target_proportion <- 0.05
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- binomial(link = "logit")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "LW", "QIS", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  n_val_large <- n_val * 10
  
  X_large <- mvrnorm(n_val_large, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_large_intercept <- cbind(1, X_large)
  beta_true <- theta_func(p_val + 1)
  mu_y_large <- as.vector(1 / (1 + exp(-X_large_intercept %*% beta_true)))
  y_large <- rbinom(n = n_val_large, size = 1, prob = mu_y_large)

  y_zero_indices <- which(y_large == 0)[1:round(n_val * target_proportion)]
  y_one_indices <- which(y_large == 1)[1:(n_val - round(n_val * target_proportion))]
  final_indices <- c(y_zero_indices, y_one_indices)
  X_final <- X_large_intercept[final_indices, ]
  y_final <- y_large[final_indices]
  
  fit_ols <- glm.fit2(X_final, y_final, control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_final, y_final, model_class = m, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different \(\rho\) values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (Imbalanced LR)")

L2 Distance Between Estimated and True Coefficients (Imbalanced LR)
	rho=-0.75	rho=-0.5	rho=0	rho=0.5	rho=0.75
OLS	46.5948	87.9700	99.5634	571.7420	340.9713
SR	46.0122	87.8380	100.1734	569.5226	341.3195
GSR	44.9937	87.4731	99.1064	571.4744	342.1765
St	44.8112	86.6661	99.1358	569.1595	339.4084
DSh	44.1640	86.4792	97.9790	568.3849	341.3638
LW	36.2126	37.3790	38.4147	37.3230	35.5573
QIS	22.4423	18.9188	20.2855	233.9737	108.4859
Sh	26.5751	17.7852	16.7885	94.8718	50.0001

Poisson GLMs

Log Link Function

For Poisson GLMs with the log link, the mean response is modeled as \(\mu = \exp(\eta)\) with \(\eta = X_{\text{intercept}} \, \beta_{\text{true}}\). In this simulation, covariates are generated from a multivariate normal distribution with a Toeplitz covariance matrix. The responses are simulated from a Poisson distribution with rate \(\mu = \exp(\eta)\). We fit the models using the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- poisson(link = "log")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  base_increment <- 0.1
  growth_rate <- 0.95 
  betas <- base_increment * (growth_rate ^ seq(from = 0, length.out = ceiling(p_val / 2)))
  betas <- rep(betas, each = 2)[1:p_val]
  signs <- rep(c(1, -1), length.out = p_val)
  betas <- betas * signs
  return(betas)
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "LW", "QIS", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  
  X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_intercept <- cbind(1, X)
  beta_true <- theta_func(p_val + 1)
  mu_y <- as.vector(exp(X_intercept %*% beta_true))
  y <- rpois(n_val, lambda = mu_y)
  
  fit_ols <- glm.fit2(X_intercept, y, control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_intercept, y, model_class = m, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different \(\rho\) values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (log link for Poisson GLM)")

L2 Distance Between Estimated and True Coefficients (log link for Poisson GLM)
	rho=-0.75	rho=-0.5	rho=0	rho=0.5	rho=0.75
OLS	0.2715	0.2909	0.2035	0.2385	0.2992
SR	0.2519	0.2721	0.1864	0.2382	0.2993
GSR	0.1821	0.2078	0.1868	0.1926	0.2771
St	0.2274	0.2464	0.1659	0.1809	0.2544
DSh	0.2780	0.2757	0.2089	0.2369	0.2962
LW	0.2004	0.2499	0.5471	0.2115	0.2859
QIS	0.2527	0.2666	0.2090	0.2165	0.2963
Sh	0.2625	0.2869	0.2011	0.2335	0.2921

sqrt Link Function

For the Poisson GLM with the sqrt link, the mean response is defined as \(\mu = \eta^2\) with \(\eta = X_{\text{intercept}} \, \beta_{\text{true}}\). The simulation procedure is similar to the log link case; however, the responses are generated with Poission distribution with \(\lambda\) \(\mu=\eta^2\). To address potential numerical issues (such as non-positive responses). Models are then fit using both the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- poisson(link = "sqrt")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "LW", "QIS", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  
  X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_intercept <- cbind(1, X)
  beta_true <- theta_func(p_val + 1)
  mu_y <- as.vector((X_intercept %*% beta_true)^2)
  y <- rpois(n_val, lambda = mu_y)
  
  fit_ols <- glm.fit2(X_intercept, y, control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_intercept, y, model_class = m, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different \(\rho\) values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (sqrt link for Poisson GLM)")

L2 Distance Between Estimated and True Coefficients (sqrt link for Poisson GLM)
	rho=-0.75	rho=-0.5	rho=0	rho=0.5	rho=0.75
OLS	105.8302	78.9315	55.8439	48.2180	38.8785
SR	106.9639	79.4258	55.4904	47.7886	38.7200
GSR	103.4377	77.6867	55.0924	47.6944	39.2449
St	99.4780	77.2330	55.0576	47.5351	38.3803
DSh	104.1511	78.5923	55.8674	47.8408	39.9734
LW	104.9087	78.2375	56.3488	48.0411	39.2115
QIS	104.6722	78.3262	56.3039	48.1902	39.1141
Sh	105.8302	78.9315	55.8439	48.2180	38.8785

Gamma GLMs

log Link Function

In Gamma GLMs with the log link, the mean response is given by \(\mu = \exp(\eta)\) with \(\eta = X_{\text{intercept}} \, \beta_{\text{true}}\). Covariates are generated as before, and responses are drawn from a Gamma distribution with shape \(\mu =\exp(\eta)\). A retry mechanism is used to ensure that all simulated responses are positive, which is required by the Gamma model. Models are fit using both the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

# Load packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "log")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  base_increment <- 0.1
  growth_rate <- 0.95 
  betas <- base_increment * (growth_rate ^ seq(from = 0, length.out = ceiling(p_val / 2)))
  betas <- rep(betas, each = 2)[1:p_val]
  signs <- rep(c(1, -1), length.out = p_val)
  betas <- betas * signs
  return(betas)
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "LW", "QIS", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))
for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  
  X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_intercept <- cbind(1, X)
  beta_true <- theta_func(p_val + 1)
  mu_y <- as.vector(exp(X_intercept %*% beta_true))
  y <-  rgamma(n_val, shape = mu_y, scale = 1)
  retry_count <- 0
  while(any(y <= 0) && retry_count < 100) {
    bad_idx <- which(y <= 0)
    y[bad_idx] <- rgamma(length(bad_idx), shape = mu_y[bad_idx], scale = 1)
    retry_count <- retry_count + 1
  }
  
  fit_ols <- glm.fit2(X_intercept, y, control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    fit <- savvy_glm.fit2(X_intercept, y, model_class = m, control = control_list, family = family_type)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different \(\rho\) values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (log link for Gamma GLM)")

L2 Distance Between Estimated and True Coefficients (log link for Gamma GLM)
	rho=-0.75	rho=-0.5	rho=0	rho=0.5	rho=0.75
OLS	0.4067	0.3339	0.2600	0.2623	0.3082
SR	0.3957	0.3258	0.2600	0.2614	0.3079
GSR	0.2540	0.2633	0.2333	0.1989	0.2318
St	0.2406	0.2534	0.2022	0.1861	0.2557
DSh	0.2701	0.2912	0.2399	0.2381	0.3028
LW	0.3683	0.2895	0.3023	0.2252	0.2911
QIS	0.3691	0.2860	0.3001	0.2409	0.2990
Sh	0.3902	0.3260	0.2567	0.2548	0.2981

sqrt Link Function

For Gamma GLMs with the sqrt link, the mean response is modeled as \(\mu = \eta^2\) with \(\eta = X_{\text{intercept}} \, \beta_{\text{true}}\). Covariates are generated as before, and responses are drawn from a Gamma distribution with shape \(\mu =\eta^2\), including a retry mechanism to ensure all response values are positive. Fitting the models using both the standard GLM (via glm.fit2) and the shrinkage-based GLM (via savvy_glm.fit2in savvyGLM package).

library(savvyGLM)
library(MASS)
library(glm2)
library(knitr)

set.seed(123)
n_val <- 500
p_val <- 25
rho_vals <- c(-0.75, -0.5, 0, 0.5, 0.75)
mu_val <- 0
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "sqrt")

sigma.rho <- function(rho_val, p_val) {
  rho_val ^ abs(outer(1:p_val, 1:p_val, "-"))
}

theta_func <- function(p_val) {
  sgn <- rep(c(1, -1), length.out = p_val)
  mag <- ceiling(seq_len(p_val) / 2)
  sgn * mag
}

findStartingValues <- function(x, y, epsilon = 1e-6) {
  beta <- Variable(ncol(x))
  eta <- x %*% beta
  objective <- Minimize(sum_squares(sqrt(y + 0.1) - eta))
  constraints <- list(eta >= epsilon)
  problem <- Problem(objective, constraints)
  psolve(problem, silent = TRUE)
  starting_values <- as.numeric(value(beta))
  return(starting_values)
}

model_names <- c("OLS", "SR", "GSR", "St", "DSh", "LW", "QIS", "Sh")
l2_matrix <- matrix(NA, nrow = length(model_names), ncol = length(rho_vals),
                    dimnames = list(model_names, paste0("rho=", rho_vals)))

for (j in seq_along(rho_vals)) {
  rho_val <- rho_vals[j]
  Sigma <- sigma.rho(rho_val, p_val)
  
  X <- mvrnorm(n_val, mu = rep(mu_val, p_val), Sigma = Sigma)
  X_intercept <- cbind(1, X)
  beta_true <- theta_func(p_val + 1)
  mu_y <- as.vector((X_intercept %*% beta_true)^2)
  y <- rgamma(n_val, shape = mu_y, scale = 1)
  
  retry_count <- 0
  while(any(y <= 0) && retry_count < 100) {
    bad_idx <- which(y <= 0)
    y[bad_idx] <- rgamma(length(bad_idx), shape = mu_y[bad_idx], scale = 1)
    retry_count <- retry_count + 1
  }
  
  starting_values <- findStartingValues(X_intercept, y)
  
  fit_ols <- glm.fit2(X_intercept, y, start = starting_values,
                      control = control_list, family = family_type)
  l2_matrix["OLS", j] <- norm(fit_ols$coefficients - beta_true, type = "2")

  for (m in model_names[-1]) {
    # Added use_parallel = FALSE to dramatically speed up vignette building
    fit <- savvy_glm.fit2(X_intercept, y, model_class = m, control = control_list, 
                          family = family_type, use_robust_start = TRUE, 
                          use_parallel = FALSE)
    l2_matrix[m, j] <- norm(fit$coefficients - beta_true, type = "2")
  }
}

A summary table here is produced to compare these distances across different \(\rho\) values.

l2_table <- as.data.frame(l2_matrix)
kable(l2_table, digits = 4, caption = "L2 Distance Between Estimated and True Coefficients (sqrt link for Gamma GLM)")

L2 Distance Between Estimated and True Coefficients (sqrt link for Gamma GLM)
	rho=-0.75	rho=-0.5	rho=0	rho=0.5	rho=0.75
OLS	108.1097	78.2093	56.4163	46.8951	41.9629
SR	108.1052	78.2082	56.3660	46.8295	41.9684
GSR	104.5311	77.1232	55.9971	46.7169	42.1281
St	102.2954	76.2955	55.7526	46.3217	41.5515
DSh	105.6050	77.2780	56.6020	46.8909	42.1011
LW	105.0065	76.7574	56.2635	46.6498	41.9816
QIS	104.1095	77.2196	55.2965	47.2840	42.0942
Sh	107.9788	78.1691	56.4011	46.8882	41.9786

Real Data Analysis

In this section, we evaluate the performance of Gamma GLM estimation methods using real flood insurance data from three states: Florida (FL), Texas (TX), and Louisiana (LA). The dataset covers the period 2014 to 2023 and includes claims from the National Flood Insurance Programme (NFIP). For each state (e.g., Florida (FL) and Louisiana (LA)), the data are no provided here, but you can download from NFIP and follow the preprocess steps provided in the paper.

For each year, the dataset is split into a 70% training set and a 30% test set. Models are fitted using the standard GLM (via glm.fit2) and several shrinkage-based GLM (via savvy_glm.fit2 with model classes such as SR, GSR, St, DSh, and optionally Sh) using two different link functions:

For Gamma GLMs with the log link, the mean response is modeled as \(\mu = \exp(\eta)\).
For Gamma GLMs with the sqrt link, the mean response is modeled as \(\mu = \eta^2\).

For each replication, the MSE on the test set is computed. For each shrinkage-based GLM, we calculate the ratio

\[ \text{MSE Ratio} = \frac{\text{MSE}_{\text{Baseline (OLS)}}}{\text{MSE}_{\text{Shrinkage}}}. \]

A ratio greater than one indicates that the shrinkage-based GLM achieved a lower MSE (i.e., superior performance) compared to the standard update. This process is repeated multiple times (e.g., \(N=100\) replications per year), and the average ratio is computed for each model and each year.

Here, we just provide two cases and for a detailed account of the full data processing and evaluation procedures, please refer to the main paper: GLM Solutions via Shrinkage.

FL dataset with log Link Function

# Load required packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(caret)
library(knitr)

set.seed(1234)
years <- 2014:2015
model_names <- c("OLS", "SR", "GSR", "St", "DSh")
N <- 10
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "log")

Evaluation_results <- matrix(NA, nrow = length(model_names), ncol = length(years),
                             dimnames = list(model_names, years))

calculate_mse <- function(true_values, predicted_values) {
  mean((true_values - predicted_values)^2)
}

for (yr in years) {
  cat("Processing year:", yr, "\n")
  filename <- sprintf("FL_data_%d.csv", yr)
  data_year <- read.csv(filename, header = TRUE, stringsAsFactors = FALSE)
  
  ratio_mat <- matrix(NA, nrow = N, ncol = length(model_names)-1)
  colnames(ratio_mat) <- model_names[-1]
  for (i in 1:N) {
    set.seed(yr * 1000 + i)
    train_index <- createDataPartition(data_year[, 1], p = 0.7, list = FALSE)
    train_data <- data_year[train_index, ]
    test_data <- data_year[-train_index, ]
    
    X_train <- as.matrix(train_data[, -1])
    y_train <- train_data[, 1]
    X_test <- as.matrix(test_data[, -1])
    y_test <- test_data[, 1]
    X_train_int <- cbind(1, X_train)
    X_test_int <- cbind(1, X_test)
  
    model_glm2 <- glm.fit2(X_train_int, y_train, start = starting_values,
                           control = control_list, family = family_type)
    y_pred_glm2 <- exp(X_test_int %*% model_glm2$coefficients)
    mse_ols <- calculate_mse(y_test, y_pred_glm2)
    for (m in model_names[-1]) {
      model_savvy <- savvy_glm.fit2(X_train_int, y_train, model_class = m, control = control_list,
                                    family = family_type, use_robust_start = TRUE)
      y_pred <- exp(X_test_int %*% model_savvy$coefficients)
      mse_savvy <- calculate_mse(y_test, y_pred)
      ratio_mat[i, m] <- mse_ols / mse_savvy
    }
  }
  avg_ratios <- c(1, colMeans(ratio_mat, na.rm = TRUE))
  Evaluation_results[, as.character(yr)] <- avg_ratios
}

Evaluation_results_df <- as.data.frame(Evaluation_results)
kable(Evaluation_results_df, digits = 4,
      caption = "Average MSE Ratio (OLS / Shrinkage) for Each Year (Gamma GLM with Log Link)")

LA dataset with sqrt Link Function

# Load required packages
library(savvyGLM)
library(MASS)
library(glm2)
library(CVXR)
library(caret)
library(knitr)

set.seed(1234)
years <- 2014:2023
model_names <- c("OLS", "SR", "GSR", "St", "DSh")
N <- 100
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type <- Gamma(link = "sqrt")

Evaluation_results <- matrix(NA, nrow = length(model_names), ncol = length(years),
                             dimnames = list(model_names, years))

calculate_mse <- function(true_values, predicted_values) {
  mean((true_values - predicted_values)^2)
}

for (yr in years) {
  cat("Processing year:", yr, "\n")
  filename <- sprintf("LA_data_%d.csv", yr)
  data_year <- read.csv(filename, header = TRUE, stringsAsFactors = FALSE)
  
  ratio_mat <- matrix(NA, nrow = N, ncol = length(model_names)-1)
  colnames(ratio_mat) <- model_names[-1]
  for (i in 1:N) {
    set.seed(yr * 1000 + i)
    train_index <- createDataPartition(data_year[, 1], p = 0.7, list = FALSE)
    train_data <- data_year[train_index, ]
    test_data <- data_year[-train_index, ]
    
    X_train <- as.matrix(train_data[, -1])
    y_train <- train_data[, 1]
    X_test <- as.matrix(test_data[, -1])
    y_test <- test_data[, 1]
    X_train_int <- cbind(1, X_train)
    X_test_int <- cbind(1, X_test)
  
    model_glm2 <- glm.fit2(X_train_int, y_train, start = starting_values,
                           control = control_list, family = family_type)
    y_pred_glm2 <- (X_test_int %*% model_glm2$coefficients)^2
    mse_ols <- calculate_mse(y_test, y_pred_glm2)
    for (m in model_names[-1]) {
      model_savvy <- savvy_glm.fit2(X_train_int, y_train, model_class = m, control = control_list,
                                    family = family_type, use_robust_start = TRUE)
      y_pred <- exp(X_test_int %*% model_savvy$coefficients)
      mse_savvy <- calculate_mse(y_test, y_pred)
      ratio_mat[i, m] <- mse_ols / mse_savvy
    }
  }
  avg_ratios <- c(1, colMeans(ratio_mat, na.rm = TRUE))
  Evaluation_results[, as.character(yr)] <- avg_ratios
}

Evaluation_results_df <- as.data.frame(Evaluation_results)
kable(Evaluation_results_df, digits = 4,
      caption = "Average MSE Ratio (OLS / Shrinkage) for Each Year (Gamma GLM with Log Link)")

savvyGLM: Shrinkage Methods for Generalized Linear Models

Introduction

Main Features

Key Argument and Output Summary

Theoretical Overview

Shrinkage estimators

Generalized Linear Model and its IRLS Implementation

Simulation Examples

Logistic Regression

Balanced Data

Imbalanced Data

Poisson GLMs

Log Link Function

sqrt Link Function

Gamma GLMs

log Link Function

sqrt Link Function

Real Data Analysis

FL dataset with log Link Function

LA dataset with sqrt Link Function