This vignette demonstrates how to estimate spatial generalized linear mixed models (GLMMs) using the spCF package for smoothing/denoising, prediction, and multiscale analysis. The spatial process in the GLMMs is trained via a coarse-to-fine approach that sequentially learns spatial patterns from coarser to finer scales. This spatial GLMM framework is particularly suitable for moderate-to-large samples. See Murakami et al. (2026) for further details.
Let us load the required packages
library(spCF)
library(sf)
#> Linking to GEOS 3.14.1, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library(CARBayesdata)The pollutionhealthdata dataset included in the CARBayesdata package is used here. This dataset comprises panel data on respiratory hospitalisations across 271 zones in Greater Glasgow. For simplicity, the center coordinates of each zone are used for spatial modeling.
In this example, the observed number of hospitalisations in 2011 (observed) is modeled for smoothing and predicting the latent risk of respiratory disease (i.e., disease mapping).
The response variable (observed), covariates (pm10, jsa, price), and an offset variable (expected) are specified as follows:
y <- dat[,"observed"] # count data
x <- dat[,c("pm10","jsa","price")]# covariates
offset <- log(dat[,"expected"]) # offset variableThe center coordinates are extracted as:
data(GGHB.IZ) # polygons of the 271 zones
coords <- st_coordinates(st_centroid(GGHB.IZ))# coordinates
#> Warning: st_centroid assumes attributes are constant over geometriesobserved is spatially plotted as follows:
In CF-GLM, the spatial process is defined as a sum of scalewise
processes, where the number of spatial scales, R, is optimized via
holdout validation. A smaller R, corresponding to early stopping, allows
the spatial process to capture only coarse-scale patterns, whereas a
larger R enables the spatial process to represent finer-scale patterns.
The cf_glm_hv function performs holdout validation as
follows:
mod_hv <- cf_glm_hv(y = y, x = x, offset=offset,
coords = coords, family=poisson())
#> [1] --- Deviance: Basic GLM ---
#> [1] 203.031
#> [1] --- Deviance: Learning multi-scale spatial process ---
#> [1] 202.0451 (Scale 1)
#> [1] 201.7257 (Scale 2)
#> [1] 201.414 (Scale 3)
#> [1] 201.0924 (Scale 4)
#> [1] 200.4269 (Scale 5)
#> [1] 199.5415 (Scale 6)
#> [1] 198.6156 (Scale 7)
#> [1] 197.7553 (Scale 8)
#> [1] 196.5995 (Scale 9)
#> [1] 195.2907 (Scale 10)
#> [1] 193.1986 (Scale 11)
#> [1] 191.2898 (Scale 12)
#> [1] 189.6105 (Scale 13)
#> [1] 188.632 (Scale 14)
#> [1] 188.0525 (Scale 15)
#> [1] 188.0141 (Scale 16)
#> [1] 188.0141 (Scale 17) no improvement
#> [1] 188.0141 (Scale 18) no improvement
#> [1] 188.0141 (Scale 19) no improvement
#> [1] 188.0141 (Scale 20) no improvement
#> [1] 187.1624 (Scale 21)
#> [1] 187.0825 (Scale 22)
#> [1] 186.7553 (Scale 23)
#> [1] 185.6057 (Scale 24)
#> [1] 185.6057 (Scale 25) no improvement
#> [1] 185.6057 (Scale 26) no improvement
#> [1] 185.6057 (Scale 27) no improvement
#> [1] 185.6057 (Scale 28) no improvement
#> [1]
#> [1] -> Selected finest scale: 24 (bandwidth: 1622.015)
#> [1]As shown in the output, the deviance loss for the validation samples gradually decreases as learning proceeds from the coarsest scale (Scale 1) to finer scales.
After holdout validation, the full model is trained using the
cf_glm function:
mod <- cf_glm(y = y, x = x, offset=offset,
coords = coords, mod_hv = mod_hv)
#> [1] --- Learning multi-scale spatial process ---
#> [1] Scale 1 (bandwidth:18301.1)
#> [1] Scale 2 (bandwidth:16470.99)
#> [1] Scale 3 (bandwidth:14823.89)
#> [1] Scale 4 (bandwidth:13341.5)
#> [1] Scale 5 (bandwidth:12007.35)
#> [1] Scale 6 (bandwidth:10806.62)
#> [1] Scale 7 (bandwidth:9725.956)
#> [1] Scale 8 (bandwidth:8753.36)
#> [1] Scale 9 (bandwidth:7878.024)
#> [1] Scale 10 (bandwidth:7090.222)
#> [1] Scale 11 (bandwidth:6381.2)
#> [1] Scale 12 (bandwidth:5743.08)
#> [1] Scale 13 (bandwidth:5168.772)
#> [1] Scale 14 (bandwidth:4651.895)
#> [1] Scale 15 (bandwidth:4186.705)
#> [1] Scale 16 (bandwidth:3768.035)
#> [1] Scale 17 (bandwidth:3391.231) no improvement (skipped)
#> [1] Scale 18 (bandwidth:3052.108) no improvement (skipped)
#> [1] Scale 19 (bandwidth:2746.897) no improvement (skipped)
#> [1] Scale 20 (bandwidth:2472.208) no improvement (skipped)
#> [1] Scale 21 (bandwidth:2224.987)
#> [1] Scale 22 (bandwidth:2002.488)
#> [1] Scale 23 (bandwidth:1802.239)
#> [1] Scale 24 (bandwidth:1622.015)The estimated regression coefficients, standard deviations of the scalewise components, and error statistics are displayed as follows:
mod
#> Call:
#> cf_glm(y = y, x = x, coords = coords, offset = offset, mod_hv = mod_hv)
#>
#> ---- Coefficients -------------------------------------
#> coef coef_se lower_95CI upper_95CI
#> x -0.55148999 0.063551384 -0.676050702 -0.42692928
#> xpm10 0.01555332 0.004185572 0.007349596 0.02375704
#> xjsa 0.06846717 0.003339543 0.061921662 0.07501267
#> xprice -0.15838146 0.018532479 -0.194705118 -0.12205780
#>
#> ---- Deviance losses (influential elements only) ------
#> elements standard_deviation
#> 1 xb 0.271748976
#> 2 spatial_scale1 0.005210782
#> 3 spatial_scale2 0.001539704
#> 4 spatial_scale3 0.001501339
#> 5 spatial_scale4 0.001565550
#> 6 spatial_scale5 0.002078805
#> 7 spatial_scale6 0.003046590
#> 8 spatial_scale7 0.003331620
#> 9 spatial_scale8 0.003634619
#> 10 spatial_scale9 0.004367815
#> 11 spatial_scale10 0.004858699
#> 12 spatial_scale11 0.006293534
#> 13 spatial_scale12 0.007053770
#> 14 spatial_scale13 0.007386425
#> 15 spatial_scale14 0.007554879
#> 16 spatial_scale15 0.008220176
#> 17 spatial_scale16 0.009258109
#> 18 spatial_scale21 0.029305289
#> 19 spatial_scale22 0.027139580
#> 20 spatial_scale23 0.022998427
#> 21 spatial_scale24 0.020260385
#>
#> ---- Error statistics ---------------------------------
#> stat value
#> 1 validation_Pseudo-R2 0.9271784
#> 2 validation_RMSE 8.5245368
#> 3 validation_MAE 0.3012617The predictive means, representing the denoised risk of respiratory disease, and their standard deviations (SDs) are mapped as follows:
# Predictive mean
GGHB.IZ$pred <- mod$pred$pred
plot(GGHB.IZ[,c("pred")],lwd=0.01,axes=TRUE, key.pos=4,nbreaks=50)
# Predictive SD
GGHB.IZ$pred_sd<- mod$pred$pred_sd
plot(GGHB.IZ[,c("pred_sd")],pal = hcl.colors(9, "Viridis"), lwd=0.01,
axes=TRUE, key.pos=4)The result suggests that both disease risk and their uncertainty increase in the central area.
The sp_scalewise function extracts scalewise spatial
processes with bandwidth values falling within a pre-specified range.
For example, the following commands extract the large- and small-scale
processes, corresponding to bandwidth ranges of 4000+ and 0–4000,
respectively.
mod_s1 <- sp_scalewise(mod,bw_range=c(4000,Inf)) # Large scale (4000 <= bandwidth)
mod_s2 <- sp_scalewise(mod,bw_range=c(0,4000)) # Small scale (bandwidth <= 4000)The extracted scalewise processes are mapped as follows:
GGHB.IZ$z1 <- mod_s1$pred$pred
GGHB.IZ$z2 <- mod_s2$pred$pred
plot(GGHB.IZ[,c("z1","z2")],lwd=0.01,axes=TRUE,key.pos=4, nbreaks=50)The sp_scalewise function is useful for multiscale
spatial pattern analysis (or feature extraction), which is commonly
conducted in ecological, epidemiological, and environmental studies.
Let us load the required packages
The meuse dataset from the sp package consists of
observations at 155 sample sites in a floodplain along the River Meuse.
In this example, we consider a binary response variable
(flood), which takes the value 1 if the flooding frequency
class is once every two years (i.e., flood-prone area) and 0 otherwise.
The binary response is predicted over 3,103 regularly spaced grid cells
(meuse.grid). The covariate considered is the distance to the
river (dist).
### Data at samples sites
data(meuse)
flood <- ifelse(meuse$ffreq==1, 1, 0 )# Binary response variable
coords <- meuse[,c("x","y")] # Coordinates
x <- meuse[,"dist"] # Covariate
### Data at prediction sites
data(meuse.grid)
coords0 <- meuse.grid[,c("x","y")] # Coordinates
x0 <- meuse.grid[,"dist"] # Covariateflood is spatially plotted as follows:
obs_s <- st_as_sf( data.frame(coords, flood), coords= c("x","y"), crs=28992)
plot(obs_s[,"flood"], pch = 20, key.pos=4, axes=TRUE)The code implementing the CF-GLM is essentially the same as in the
previous example, except that family is set to
binomial() for modeling binary data:
set.seed(1234) # For this vignette, training samples are fixed
mod_hv <- cf_glm_hv(y = flood, x = x, coords = coords, family=binomial())
#> [1] --- Deviance: Basic GLM ---
#> [1] 49.90513
#> [1] --- Deviance: Learning multi-scale spatial process ---
#> [1] 36.50819 (Scale 1)
#> [1] 32.74159 (Scale 2)
#> [1] 31.20849 (Scale 3)
#> [1] 30.50053 (Scale 4)
#> [1] 30.11569 (Scale 5)
#> [1] 29.93435 (Scale 6)
#> [1] 29.79527 (Scale 7)
#> [1] 29.63224 (Scale 8)
#> [1] 29.51661 (Scale 9)
#> [1] 29.4912 (Scale 10)
#> [1] 29.4912 (Scale 11) no improvement
#> [1] 29.4912 (Scale 12) no improvement
#> [1] 29.4912 (Scale 13) no improvement
#> [1] 29.4912 (Scale 14) no improvement
#> [1]
#> [1] -> Selected finest scale: 10 (bandwidth: 556.7079)
#> [1]In the subsequent full model training using the cf_glm
function, the covariates (x0) and coordinates
(coords0) at the prediction sites are also specified to
enable spatial prediction:
mod <- cf_glm(y = flood, x=x, coords = coords,
x0=x0, coords0 = coords0, mod_hv = mod_hv)
#> [1] --- Learning multi-scale spatial process ---
#> [1] Scale 1 (bandwidth:1436.96)
#> [1] Scale 2 (bandwidth:1293.264)
#> [1] Scale 3 (bandwidth:1163.938)
#> [1] Scale 4 (bandwidth:1047.544)
#> [1] Scale 5 (bandwidth:942.7897)
#> [1] Scale 6 (bandwidth:848.5107)
#> [1] Scale 7 (bandwidth:763.6596)
#> [1] Scale 8 (bandwidth:687.2937)
#> [1] Scale 9 (bandwidth:618.5643)
#> [1] Scale 10 (bandwidth:556.7079)The estimated regression coefficients, standard deviations of the scalewise components, and error statistics are displayed as follows:
mod
#> Call:
#> cf_glm(y = flood, x = x, coords = coords, x0 = x0, coords0 = coords0,
#> mod_hv = mod_hv)
#>
#> ---- Coefficients -------------------------------------
#> coef coef_se lower_95CI upper_95CI
#> x1 1.130556 0.3449302 0.4544933 1.806620
#> x2 -3.676506 1.1658584 -5.9615887 -1.391424
#>
#> ---- Deviance losses (influential elements only) ------
#> elements standard_deviation
#> 1 xb 0.72685318
#> 2 spatial_scale1 0.76519927
#> 3 spatial_scale2 0.35405089
#> 4 spatial_scale3 0.18224002
#> 5 spatial_scale4 0.10319421
#> 6 spatial_scale5 0.08257470
#> 7 spatial_scale6 0.07569099
#> 8 spatial_scale7 0.07704193
#> 9 spatial_scale8 0.09548835
#> 10 spatial_scale9 0.10727839
#> 11 spatial_scale10 0.12784213
#>
#> ---- Error statistics ---------------------------------
#> stat value
#> 1 validation_Pseudo-R2 0.53751623
#> 2 validation_RMSE 0.30721854
#> 3 validation_MAE 0.06389095The predictive values and their standard deviations at the grid cells are mapped as follows:
### Convert gridded points to gridded polygons (for clear visualization)
meuse.grid_sp <- meuse.grid
coordinates(meuse.grid_sp)<- c("x", "y")
gridded(meuse.grid_sp) <- TRUE
meuse.grid_sf <- st_as_sf(as(meuse.grid_sp, "SpatialPolygons"))
st_crs(meuse.grid_sf) <- 28992
### Mapping predictive mean and standard deviations
meuse.grid_sf$pred <- mod$pred0$pred # Predictive mean
meuse.grid_sf$pred_sd <- mod$pred0$pred_sd# Predictive standard deviations
plot(meuse.grid_sf[,"pred"], border = NA, nbreaks = 20, key.pos=4,axes=TRUE)Let us extract the large- and small-scale processes, corresponding to bandwidth ranges of 1000+ and 0–1000, respectively.
mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth)
mod_s2<- sp_scalewise(mod,bw_range=c(0,1000)) # Small scale (0 <= bandwidth <= 1000)The extracted scalewise processes are mapped as follows:
meuse.grid_sf$z1 <- mod_s1$pred0$pred
meuse.grid_sf$z2 <- mod_s2$pred0$pred
plot(meuse.grid_sf[,c("z1","z2")], border=NA, nbreaks=20, key.pos=4, axes=TRUE)