---
title: "Latent Class Discriminant Analysis"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Latent Class Discriminant Analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

## Overview

The `lcda` package provides latent class discriminant analysis methods for
categorical predictors. The main functions are:

- `lcda()` for class-specific latent class models.
- `cclcda()` for common-components latent class models.
- `cclcda2()` for common-components models with class-conditional mixing weights.

All manifest variables and class labels must be integer-coded and start at 1.

## Background

The methods in `lcda` implement local discrimination for discrete variables using
latent class analysis (LCA). The key idea is to replace a single class-conditional
distribution with a finite mixture of locally independent components. This lets
each class capture heterogeneity while keeping the model tractable for categorical
data.

Let `K` be the number of classes, `M` the number of latent components, `D` the
number of manifest variables, and `R_d` the number of outcomes for variable `d`.
The indicator `x_dr` equals 1 if variable `d` takes outcome `r` and 0 otherwise.

## Models

### LCDA (class-specific mixtures)

Each class has its own latent class model:

$$
f_k(x) = \sum_{m=1}^{M_k} w_{mk} \prod_{d=1}^D \prod_{r=1}^{R_d} \theta_{mkdr}^{x_{dr}}
$$

Classification follows the Bayes decision rule:

$$
\hat{k}(x) = \arg\max_k \pi_k f_k(x)
$$

### CCLCDA (common components)

Common-components models share the component distributions across classes, while
allowing class-specific mixing weights:

$$
f_k(x) = \sum_{m=1}^{M} w_{mk} \prod_{d=1}^D \prod_{r=1}^{R_d} \theta_{mdr}^{x_{dr}}
$$

`cclcda()` first estimates the shared LCA on the pooled data and then derives
class-conditional weights. `cclcda2()` estimates weights and response
probabilities jointly in each EM step.

## Estimation and model selection

Parameter estimation uses the EM algorithm with random starts (see `nrep`). Model
selection can be guided by AIC, BIC, the likelihood ratio statistic (Gsq), and the
Pearson chi-square statistic (Chisq). For common-components models, additional
quality measures are provided:

- Weighted entropy, measuring the purity of latent components.
- Weighted Gini, an alternative impurity measure.
- A chi-square test of independence between latent components and classes.

These are reported in the fitted model objects returned by `cclcda()` and
`cclcda2()`.

## Example: CCL-CDA2 on Iris

```{r}
library(lcda)

data(iris)

iris_cat <- within(iris, {
  Sepal.Length <- as.integer(cut(Sepal.Length, breaks = c(-Inf, 5.1, 5.8, 6.4, Inf)))
  Sepal.Width <- as.integer(cut(Sepal.Width, breaks = c(-Inf, 2.8, 3.0, 3.3, Inf)))
  Petal.Length <- as.integer(cut(Petal.Length, breaks = c(-Inf, 1.6, 4.35, 5.1, Inf)))
  Petal.Width <- as.integer(cut(Petal.Width, breaks = c(-Inf, 0.3, 1.3, 1.8, Inf)))
  Species3 <- as.integer(Species)
})

model <- cclcda2(
  Species3 ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
  data = iris_cat,
  m = 1
)

model$bic
```

## References

Bücker, M., Szepannek, G., Weihs, C. (2010). Local Classification of Discrete
Variables by Latent Class Models. In: Locarek-Junge, H., Weihs, C. (eds)
Classification as a Tool for Research. Studies in Classification, Data Analysis,
and Knowledge Organization. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-642-10745-0_13

Bücker, M. (2008). Lokale Diskrimination diskreter Daten. Diplomarbeit,
Fakultaet Statistik, TU Dortmund.
