Introduction

In this vignette we fit a Bayesian latent class analysis model with a prior on the number of components (classes) \(K\) to the fear data set. This multivariate data set contains three categorical variables with 3, 3, and 4 levels and latent class analysis was previously performed to extract two groups (Stern et al. 1994). We use the prior specification and the telescoping sampler for performing MCMC sampling as proposed in Frühwirth-Schnatter, Malsiner-Walli, and Grün (2021).

First, we load the package.

library("telescope")

The fear data set

We create the fear data set and define the profile of success probabilities pi_stern for the two groups identified by Stern et al. (1994) for benchmarking purposes.

freq <- c(5, 15, 3, 2, 4, 4, 3, 1, 1, 2, 4, 2, 0, 2, 0, 0, 1, 3, 2, 1,
          2, 1, 3, 3, 2, 4, 1, 0, 0, 4, 1, 3, 2, 2, 7, 3)
pattern <- cbind(F = rep(rep(1:3, each = 4), 3),
                 C = rep(1:3, each = 3 * 4),
                 M = rep(1:4, 9))
fear <- pattern[rep(seq_along(freq), freq),]
pi_stern <- matrix(c(0.74, 0.26, 0.0, 0.71, 0.08, 0.21, 0.22, 0.6, 0.12, 0.06, 
                     0.00, 0.32, 0.68, 0.28, 0.31, 0.41, 0.14, 0.19, 0.40, 0.27), 
                   ncol = 10, byrow = TRUE)

We store the dimension of the data set.

N <- nrow(fear)
r <- ncol(fear)

We visualize the data set.

plotBubble(fear)

Model specification

For multivariate categorical observations \(\mathbf{y}_1,\ldots,\mathbf{y}_N\) the following model with hierachical prior structure is assumed:

\[\begin{aligned} \mathbf{y}_i \sim \sum_{k=1}^K \eta_k \prod_{j=1}^r \prod _{d=1}^{D_j} \pi_{k,jd}^{I\{y_{ij}=d\}}, & \qquad \text{ where } \pi_{k,jd} = Pr(Y_{ij}=d|S_i=k)\\ K \sim p(K)&\\ \boldsymbol{\eta} \sim Dir(e_0)&, \qquad \text{with } e_0 \text{ fixed, } e_0\sim p(e_0) \text { or } e_0=\frac{\alpha}{K}, \text{ with } \alpha \text{ fixed or } \alpha \sim p(\alpha),\\ \boldsymbol{\pi}_{k,j} \sim Dir(a_0).& \end{aligned}\]

Specification of the MCMC simulation and prior parameters

For MCMC sampling we need to specify Mmax, the maximum number of iterations, thin, the thinning imposed to reduce auto-correlation in the chain by only recording every thined observation, and burnin, the number of burn-in iterations not recorded.

Mmax <- 5000
thin <- 10
burnin <- 1000

The specifications of Mmax and thin imply M, the number of recorded observations.

M <- Mmax/thin

We specify with Kmax the maximum number of components possible during sampling. Kinit denotes the initial number of filled components.

Kmax <- 50  
Kinit <- 10

We fit a dynamic specification on the weights.

G <- "MixDynamic"

For a static specific one would need to use "MixStatic".

For the dynamic setup, we specify the F- distribution \(F(6,3)\) as the prior on alpha.

priorOnAlpha <- priorOnAlpha_spec("F_6_3")

We need to select the prior on K. We use the prior \(K-1 \sim BNB(1, 4, 3)\) as suggested in Frühwirth-Schnatter, Malsiner-Walli, and Grün (2021) for a model-based clustering context.

priorOnK <- priorOnK_spec("BNB_143")

Now, we specify the component-specific priors. We use a symmetric Dirichlet prior for the success probabilities \(\boldsymbol{\pi}_{k,j} \sim \mathcal{Dir}(a_0)\) for each variable with \(a_0=1\) inducing a uniform prior.

cat <- apply(fear, 2, max)
a0 <- rep(1, sum(cat))

We obtain an initial partition S_0 and initial component weights eta_0 using either kmodes() or kmeans().

set.seed(1234)
if (requireNamespace("klaR", quietly = TRUE)) {
    cl_y <- klaR::kmodes(data = fear, modes = Kinit,
                         iter.max = 20, weighted = FALSE)
} else {
    cl_y <- kmeans(fear, centers = Kinit, iter.max = 20)
}
S_0 <- cl_y$cluster
eta_0 <- cl_y$size/N

We determine initial values pi_0 for the success probabilities based on the initial partition.

pi_0 <- do.call("cbind", lapply(1:r, function(j) {
    prop.table(table(S_0, fear[, j]), 1)
}))
round(pi_0, 2)

##       1    2    3    1    2    3    1    2    3    4
## 1  0.00 0.00 1.00 0.00 1.00 0.00 0.40 0.20 0.00 0.40
## 2  0.00 0.00 1.00 0.00 0.15 0.85 0.15 0.08 0.69 0.08
## 3  0.00 0.33 0.67 0.89 0.11 0.00 0.00 0.11 0.89 0.00
## 4  0.70 0.20 0.10 0.00 0.00 1.00 0.20 0.70 0.10 0.00
## 5  0.00 0.86 0.14 0.86 0.14 0.00 0.86 0.14 0.00 0.00
## 6  0.29 0.14 0.57 0.86 0.14 0.00 0.00 0.14 0.00 0.86
## 7  0.00 1.00 0.00 0.17 0.83 0.00 0.00 0.67 0.33 0.00
## 8  1.00 0.00 0.00 1.00 0.00 0.00 0.71 0.00 0.29 0.00
## 9  0.00 0.78 0.22 0.00 0.11 0.89 0.00 0.22 0.11 0.67
## 10 0.90 0.10 0.00 0.90 0.10 0.00 0.00 0.95 0.05 0.00

MCMC sampling

Using this prior specification as well as initialization and MCMC settings, we draw samples from the posterior using the telescoping sampler.

The first argument of the sampling function is the data followed by the initial partition and the initial parameter values for component-specific success probabilities and weights. The next argument corresponds to the hyperparameter of the prior setup (a0). Then the setting for the MCMC sampling is specified using M, burnin, thin and Kmax. Finally the prior specification for the weights and the prior on the number of components are given (G, priorOnK, priorOnAlpha).

result <- sampleLCA(
    fear, S_0, pi_0, eta_0, a0, 
    M, burnin, thin, Kmax, 
    G, priorOnK, priorOnAlpha)

The sampling function returns a named list where the sampled parameters and latent variables are contained. The list includes the component-specific success probabilities Pi, the weights Eta, the assignments S, the number of observations Nk assigned to components, the number of components K, the number of filled components Kplus, parameter values corresponding to the mode of the nonnormalized posterior nonnormpost_mode, the acceptance rate in the Metropolis-Hastings step when sampling \(\alpha\) or \(e_0\), \(\alpha\) and \(e_0\). These values can be extracted for post-processing.

Eta <- result$Eta
Pi <- result$Pi
Nk <- result$Nk
S <- result$S
K <- result$K
Kplus <- result$Kplus   
nonnormpost_mode <- result$nonnormpost_mode
e0 <- result$e0
alpha <- result$alpha
acc <- result$acc

Convergence diagnostics of the run

We inspect the acceptance rate when sampling \(\alpha\) and the trace plot of the sampled alpha:

acc <- sum(acc)/M
acc

## [1] 0.406

plot(1:length(alpha), alpha, type = "l", 
     ylim = c(0, max(alpha)), 
     xlab = "iterations", ylab = expression(alpha))

We further assess the distribution of the sampled \(\alpha\) by inspecting a histogram as well determining the mean and some quantiles.

hist(alpha, freq = FALSE, breaks = 50)

mean(alpha)

## [1] 5.109542

quantile(alpha, probs = c(0.25, 0.5, 0.75))

##      25%      50%      75% 
## 1.109833 2.491861 4.706211

We also inspect the trace plot of the induced hyperparameter \(e_0\):

plot(1:length(e0), e0, type = "l", 
     ylim = c(0, max(e0)), 
     xlab = "iterations", ylab = expression(e["0"]))

In addition we plot the histogram of the induced \(e_0\) as well as their mean and some quantiles.

hist(e0, freq = FALSE, breaks = 50)

mean(e0)

## [1] 2.000098

quantile(e0, probs = c(0.25, 0.5, 0.75))

##       25%       50%       75% 
## 0.3382706 0.8069247 1.7778566

To further assess convergence, we also inspect the trace plots for the number of components \(K\) and the number of filled components \(K_+\).

plot(1:length(K), K, type = "l", ylim = c(0, 30), 
     xlab = "iteration", main = "", ylab = "number",
     lwd = 0.5, col = "grey")
points(1:length(K), Kplus, type = "l", col = "red3",
       lwd = 2, lty = 1)
legend("topright", legend = c("K", "K+"), col = c("grey", "red3"),
       lwd = 2)

Identification of the mixture model

Step 1: Estimate \(K_+\) and \(K\)

We determine the posterior distribution of the number of filled components \(K_+\), approximated using the telescoping sampler. We visualize the distribution of \(K_+\) using a barplot.

Kplus <- rowSums(Nk != 0, na.rm = TRUE)  
p_Kplus <- tabulate(Kplus, nbins = max(Kplus))/M 
barplot(p_Kplus/sum(p_Kplus), xlab = expression(K["+"]), names = 1:length(p_Kplus),
        col = "red3", ylab = expression("p(" ~ K["+"] ~ "|" ~ bold(y) ~ ")"))

The distribution of \(K_+\) is also characterized using the 1st and 3rd quartile as well as the median.

quantile(Kplus, probs = c(0.25, 0.5, 0.75))

## 25% 50% 75% 
## 2.0 2.5 3.0

We obtain a point estimate for \(K_+\) by taking the mode and determine the number of MCMC draws where exactly \(K_+\) components were filled.

Kplus_hat <- which.max(p_Kplus)
Kplus_hat

## [1] 2

M0 <- sum(Kplus == Kplus_hat)
M0

## [1] 250

We also determine the posterior distribution of the number of components \(K\) directly drawn using the telescoping sampler.

p_K <- tabulate(K, nbins = max(K))/M
quantile(K, probs = c(0.25, 0.5, 0.75))

## 25% 50% 75% 
##   2   3   4

barplot(p_K/sum(p_K), names = 1:length(p_K), xlab = "K", 
        ylab = expression("p(" ~ K ~ "|" ~ bold(y) ~ ")"))

which.max(tabulate(K, nbins = max(K)))

## [1] 2

Step 2: Extracting the draws with exactly \(\hat{K}_+\) non-empty components

First we select those draws where the number of filled groups was exactly \(\hat{K}_+\):

index <- Kplus == Kplus_hat
Nk[is.na(Nk)] <- 0
Nk_Kplus <- (Nk[Kplus == Kplus_hat, ] > 0)

In the following we extract the cluster success probabilities, data cluster sizes and cluster assignments for the draws where exactly \(\hat{K}_+\) components were filled.

Pi_inter <- Pi[index, , ]
Pi_Kplus <- array(0, dim = c(M0, sum(cat), Kplus_hat))
for (j in 1:sum(cat)) {
  Pi_Kplus[, j, ] <- Pi_inter[, , j][Nk_Kplus]
}

Eta_inter <- Eta[index, ]
Eta_Kplus <- matrix(Eta_inter[Nk_Kplus], ncol = Kplus_hat)
Eta_Kplus <- sweep(Eta_Kplus, 1, rowSums(Eta_Kplus), "/")

w <- which(index)
S_Kplus <- matrix(0, M0, N)
for (i in seq_along(w)) {
    m <- w[i]
    perm_S <- rep(0, Kmax)
    perm_S[Nk[m, ] != 0] <- 1:Kplus_hat
    S_Kplus[i, ] <- perm_S[S[m, ]]
}

Step 3: Clustering and relabeling of the MCMC draws

For model identification, we cluster the draws of the success probabilities where exactly \(\hat{K}_+\) components were filled in the point process representation using \(k\)-means clustering.

Func_init <- nonnormpost_mode[[Kplus_hat]]$pi
identified_Kplus <- identifyMixture(
    Pi_Kplus, Pi_Kplus, Eta_Kplus, S_Kplus, Func_init)

We inspect the non-permutation rate to assess how well separated the data clusters are and thus how easily one can obtain a suitable relabeling of the draws. Low values of the non-permutation rate, i.e., close to zero, indicate that the solution can be easily identified pointing to a good clustering solution being obtained.

identified_Kplus$non_perm_rate

## [1] 0

Step 4: Estimating data cluster specific parameters and determining the final partition

The relabeled draws are also returned which can be used to determine posterior mean values for data cluster specific parameters. The estimated success probabilities are compared to the estimates reported in Stern et al. (1994).

colMeans(identified_Kplus$Eta)

## [1] 0.5377879 0.4622121

Pi_identified <- colMeans(identified_Kplus$Mu)
round(t(Pi_identified), digits = 2)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0.62 0.28 0.10 0.67 0.11 0.22 0.22 0.57 0.13  0.08
## [2,] 0.07 0.31 0.63 0.27 0.31 0.42 0.14 0.16 0.41  0.28

pi_stern

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0.74 0.26 0.00 0.71 0.08 0.21 0.22 0.60 0.12  0.06
## [2,] 0.00 0.32 0.68 0.28 0.31 0.41 0.14 0.19 0.40  0.27

The final partition is obtained by assigning each observation to the group where it was assigned most frequently during sampling.

z_sp <- apply(identified_Kplus$S, 2,
              function(x) which.max(tabulate(x, Kplus_hat)))
table(z_sp)

## z_sp
##  1  2 
## 51 42

References

Frühwirth-Schnatter, Sylvia, Gertraud Malsiner-Walli, and Bettina Grün. 2021. “Generalized Mixtures of Finite Mixtures and Telescoping Sampling.” Bayesian Analysis 16 (4): 1279–1307. https://doi.org/10.1214/21-BA1294.

Stern, H., D. Arcus, J. Kagan, D. B. Rubin, and N. Snidman. 1994. “Statistical Choices in Infant Temperament Research.” Behaviormetrika 21 (1): 1–17. https://doi.org/10.2333/bhmk.21.1.

Bayesian Latent Class Analysis Models with the Telescoping Sampler

Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, Bettina Grün