---
title: "Offline Changepoint Detection"
author: "José Mauricio Gómez Julián"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Offline Changepoint Detection}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4
)
```

## Introduction

Offline changepoint detection is used when you have a complete dataset and want to 
identify where regime changes occurred retrospectively. This is the "archaeological" 
approach to changepoint detection.

## When to Use Offline Detection

- Historical data analysis
- Research and scientific studies
- Batch processing of data
- When the complete dataset is available

## PELT: Pruned Exact Linear Time

PELT is the gold standard for offline multiple changepoint detection:

```{r message=FALSE, warning=FALSE}
library(RegimeChange)

# Generate data with multiple changepoints
set.seed(42)
data <- c(
  rnorm(100, 0, 1),   # Regime 1
  rnorm(100, 3, 1),   # Regime 2
  rnorm(100, 1, 2),   # Regime 3
  rnorm(100, 4, 0.5)  # Regime 4
)
true_cps <- c(100, 200, 300)

# Detect with PELT
result_pelt <- detect_regimes(data, method = "pelt", penalty = "BIC")
print(result_pelt)
```

### Penalty Selection

The penalty controls the trade-off between fit and complexity:

```{r}
# Different penalties
result_bic <- detect_regimes(data, method = "pelt", penalty = "BIC")
result_aic <- detect_regimes(data, method = "pelt", penalty = "AIC")
result_mbic <- detect_regimes(data, method = "pelt", penalty = "MBIC")

cat("BIC:", result_bic$n_changepoints, "changepoints\n")
cat("AIC:", result_aic$n_changepoints, "changepoints\n")
cat("MBIC:", result_mbic$n_changepoints, "changepoints\n")
```

- **BIC**: Bayesian Information Criterion (balanced, default)
- **AIC**: Akaike Information Criterion (more changepoints)
- **MBIC**: Modified BIC (fewer changepoints)
- **Manual**: Use a numeric value for custom penalty

### Minimum Segment Length

Prevent very short segments:

```{r}
result <- detect_regimes(data, method = "pelt", min_segment = 30)
```

## Binary Segmentation

A fast greedy approach:

```{r}
result_binseg <- detect_regimes(data, method = "binseg", n_changepoints = 5)
print(result_binseg)
```

Binary segmentation finds changepoints recursively but doesn't guarantee global optimum.

## Wild Binary Segmentation

More robust than standard binary segmentation:

```{r}
result_wbs <- detect_regimes(data, method = "wbs", M = 100)
print(result_wbs)
```

WBS uses random intervals making it more robust to closely-spaced changepoints.

## Detecting Different Types of Changes

### Mean Changes

```{r}
result_mean <- detect_regimes(data, type = "mean")
```

### Variance Changes

```{r}
# Data with variance change
set.seed(123)
var_data <- c(rnorm(100, 0, 1), rnorm(100, 0, 3))

result_var <- detect_regimes(var_data, type = "variance")
print(result_var)
```

### Mean and Variance Changes

```{r}
result_both <- detect_regimes(data, type = "both")
```

## Visualization

```{r}
# Basic plot with changepoints
plot(result_pelt, type = "data")
```

```{r}
# Segment-colored plot
plot(result_pelt, type = "segments")
```

## Segment Analysis

Access segment information:

```{r}
# Get segment details
for (i in seq_along(result_pelt$segments)) {
  seg <- result_pelt$segments[[i]]
  cat(sprintf("Segment %d: [%d, %d] - Mean: %.2f, SD: %.2f\n", 
              i, seg$start, seg$end, seg$params$mean, seg$params$sd))
}
```

## Uncertainty Quantification

Get confidence intervals using bootstrap:

```{r}
result_ci <- detect_regimes(data, method = "pelt", 
                            uncertainty = TRUE, bootstrap_reps = 100)

if (length(result_ci$confidence_intervals) > 0) {
  print(result_ci$confidence_intervals[[1]])
}
```

## Evaluation Against Ground Truth

```{r}
eval_result <- evaluate(result_pelt, true_changepoints = true_cps)
print(eval_result)
```

Key metrics:
- **Hausdorff distance**: Maximum error in changepoint location
- **F1 score**: Balance of precision and recall
- **Adjusted Rand Index**: Segmentation agreement corrected for chance

## Comparing Methods

```{r}
comparison <- compare_methods(
  data = data,
  methods = c("pelt", "binseg", "wbs"),
  true_changepoints = true_cps
)
print(comparison)
```

## Best Practices

1. **Start with PELT** using BIC penalty
2. **Validate segment length** - use min_segment to avoid short segments
3. **Compare multiple methods** when stakes are high
4. **Use bootstrap CI** for critical applications
5. **Visualize results** to sanity-check detection

## Tips for Difficult Cases

### Closely Spaced Changepoints
Use WBS instead of PELT:
```{r eval=FALSE}
detect_regimes(data, method = "wbs", M = 200)
```

### Small Change Magnitudes
Lower the penalty:
```{r eval=FALSE}
detect_regimes(data, method = "pelt", penalty = "AIC")
```

### Many Changepoints
Use ensemble methods:
```{r eval=FALSE}
detect_regimes(data, method = "ensemble", 
               methods = c("pelt", "wbs", "binseg"))
```
