Generalized Linear Model and its IRLS Implementation
A GLM assumes that a response variable \(Y\), defined on some set \(\mathcal{Y} \subseteq \Re\), is related to
a set of covariates \(\mathbf{X}\) in
\(\mathcal{X} \subseteq \Re{^d}\). The
conditional distribution of \(Y\)
belongs to the exponential dispersion family, with probability density
or mass function
\[
f_Y(y; \theta, \phi) \;=\; \exp\left\{\frac{\theta\,y -
b(\theta)}{a(\phi)} \;+\; c(y, \phi)\right\},
\]
where \(\theta\) is the canonical
parameter, \(\phi\) is the dispersion
parameter, and \(a\), \(b\), \(c\)
are known functions. The mean of \(Y\)
is linked to a linear predictor \(\eta_i =
\mathbf{x}_i^\top \boldsymbol{\beta}\) through a link function
\(g\), so that
\[
\mathbb{E}[Y_i \mid \mathbf{X}_i = \mathbf{x}_i] \;=\;
h\left(\mathbf{x}_i^\top \boldsymbol{\beta}\right),
\]
where \(h = g^{-1}\) is the inverse
of the link. Under a canonical link, \(h(\cdot) = b^\prime(\cdot)\).
For an independent sample \(\left\{(y_i,
\mathbf{x}_i)\right\}_{i=1}^n\), the log-likelihood for \(\boldsymbol{\beta}\) can be written as
\[
\ell(\boldsymbol{\beta})
\;=\; \sum_{i=1}^n \frac{\theta_i\,y_i \;-\; b(\theta_i)}{a(\phi)} \;+\;
c\left(y_i, \phi\right),
\]
where \(\theta_i = (b^\prime)^{-1}\circ\,
h\left(\mathbf{x}_i^\top \boldsymbol{\beta}\right)\). Maximizing
this log-likelihood is equivalent to minimizing the following objective
function,
\[
\mathcal{C}(\boldsymbol{\beta})
\;=\; -\sum_{i=1}^n \left(\theta_i\,y_i \;-\; b(\theta_i)\right).
\]
Taking the derivative of \(\mathcal{C}(\boldsymbol{\beta})\) with
respect to \(\boldsymbol{\beta}\) leads
to the so-called normal equations, which can be solved iteratively. In
particular, the IRLS algorithm reformulates each iteration as a WLS
problem:
\[
\widehat{\boldsymbol{\beta}}^{(t+1)}
\;=\; \underset{\boldsymbol{\beta}}{\mathrm{arg\,min}}
\;\left(\mathbf{z}^{(t)} - \mathbf{X}\,\boldsymbol{\beta}\right)^\top
\mathbf{W}^{(t)}
\left(\mathbf{z}^{(t)} - \mathbf{X}\,\boldsymbol{\beta}\right),
\]
where \(\mathbf{W}^{(t)}\) is the
diagonal matrix of weights and \(\mathbf{z}^{(t)}\) is the pseudo-response
at iteration \(t\). Specifically,
\[
\mathbf{W}^{(t)}
\;=\; \mathrm{diag}\left(\left(h^\prime(\eta_i^{(t)})\right)^2 /
V\left(\mu_i^{(t)}\right)\right),
\quad
\mathbf{z}_i^{(t)}
\;=\; \eta_i^{(t)} \;+\; \frac{y_i -
\mu_i^{(t)}}{h^\prime(\eta_i^{(t)})},
\]
with \(\mu_i^{(t)} =
h(\eta_i^{(t)})\) and \(\eta_i^{(t)} =
\mathbf{x}_i^\top \widehat{\boldsymbol{\beta}}^{(t)}\). Here,
\(V(\mu_i)\) is the variance function
determined by the exponential family, and \(h^\prime\) is the derivative of the inverse
link.
By iterating this WLS problem until convergence (or until a maximum
number of iterations is reached), IRLS provides the maximum likelihood
estimates for the GLM parameters. In savvyGLM, these WLS
updates are further enhanced by applying shrinkage estimators
(St_ost, DSh_ost, SR_ost,
GSR_ost}, orSh_ost`) at each iteration, thereby improving
estimation accuracy and convergence stability.