---
title: "Column charts"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Column charts}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Introduction

Column charts are essential tools in epidemiological surveillance for comparing counts across categories such as regions, time periods, or organism types. The `col_chart()` function provides flexible options for creating both static and interactive column charts with support for grouping, stacking, labeling, and advanced features like case boxes and threshold lines.

## Prerequisites

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = TRUE, fig.width = 8, fig.height = 6, warning = FALSE, message = FALSE)
```

```{r load-libraries, eval=TRUE, echo=TRUE}
library(epiviz)
library(dplyr)
library(lubridate)
```

## Example 1: Basic single-series column chart

Simple column charts are ideal for comparing counts across categories. This example shows regional distribution of laboratory detections.

### Prepare the data

```{r prepare-basic-data, eval=TRUE, echo=TRUE}
# Aggregate detections by region for a specific time period (as used in tests)
regional_summary <- epiviz::lab_data %>%
  filter(
    specimen_date >= as.Date("2023-01-01"),
    specimen_date <= as.Date("2023-12-31")
  ) %>%
  group_by(region) %>%
  summarise(detections = n()) %>%
  ungroup()
```

### Create the basic column chart

```{r basic-col-chart, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE}
col_chart(
  dynamic = FALSE,  # Create static ggplot chart
  params = list(
    df = regional_summary,
    x = "region",           # Categorical variable for x-axis
    y = "detections",       # Numeric variable for y-axis
    fill_colours = "#007C91",  # Single color for all bars
    chart_title = "Laboratory Detections by Region 2023",
    x_axis_title = "Region",
    y_axis_title = "Number of detections",
    x_axis_label_angle = -45  # Rotate labels for readability
  )
)
```

**Interpretation**: This chart clearly shows the regional distribution of laboratory detections in 2023, with London having the highest number of detections and other regions following in descending order.

## Example 2: Grouped stacked column chart

When you need to compare multiple categories within each group, stacked column charts are effective. This example shows detections by organism type within each region.

### Prepare the data

```{r prepare-grouped-data, eval=TRUE, echo=TRUE}
# Aggregate by region and organism species (as used in tests)
region_organism_summary <- epiviz::lab_data %>%
  filter(
    specimen_date >= as.Date("2023-01-01"),
    specimen_date <= as.Date("2023-12-31")
  ) %>%
  group_by(region, organism_species_name) %>%
  summarise(detections = n()) %>%
  ungroup()
```

### Create the grouped stacked chart

```{r chunk-1, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE}
col_chart(
  dynamic = FALSE,  # Create static ggplot chart
  params = list(
    df = region_organism_summary,
    x = "region",                    # Primary grouping variable
    y = "detections",                # Value variable
    group_var = "organism_species_name",  # Secondary grouping variable
    group_var_barmode = "stack",     # Stack bars within each group
    fill_colours = c("KLEBSIELLA PNEUMONIAE" = "#007C91",
                     "STAPHYLOCOCCUS AUREUS" = "#8A1B61",
                     "PSEUDOMONAS AERUGINOSA" = "#FF7F32"),  # Named color mapping
    chart_title = "Laboratory Detections by Region \nand Species 2023",
    chart_footer = "This chart has been created using simulated data.",
    x_axis_title = "Region",
    y_axis_title = "Number of detections",
    legend_title = "Organism species",
    x_axis_label_angle = -45
  )
)
```

**Interpretation**: This stacked chart reveals both regional differences in total detections and the relative contribution of different organism types within each region.

## Example 3: Column chart with bar labels

Bar labels show exact values on each bar, making it easier to read precise counts without estimating from the axis.

### Create the chart with bar labels

```{r chunk-2, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE}
col_chart(
  dynamic = FALSE,  # Create static ggplot chart
  params = list(
    df = regional_summary,
    x = "region",
    y = "detections",
    fill_colours = "#007C91",
    chart_title = "Laboratory Detections by Region 2023",
    x_axis_title = "Region",
    y_axis_title = "Number of detections",
    x_axis_label_angle = -45,
    bar_labels = "detections",      # Show values on bars
    bar_labels_pos = "bar_base"     # Position labels at base of bars
  )
)
```

**Interpretation**: The bar labels make it easy to see exact detection counts for each region without having to estimate from the y-axis scale.

## Example 4: Interactive column chart with case boxes

Case boxes are useful for highlighting specific data points or adding additional context to your visualization.

### Prepare data for case boxes

```{r chunk-3, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE}
# Use a shorter time period for case boxes demonstration
case_box_data <- epiviz::lab_data %>%
  filter(
    specimen_date >= as.Date("2023-01-01"),
    specimen_date <= as.Date("2023-01-07")  # One week for case boxes
  ) %>%
  group_by(region, organism_species_name) %>%
  summarise(detections = n()) %>%
  ungroup()
```

### Create the interactive chart with case boxes

```{r chunk-4, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE}
col_chart(
  dynamic = TRUE,   # Create interactive plotly chart
  params = list(
    df = case_box_data,
    x = "region",
    y = "detections",
    group_var = "organism_species_name",
    group_var_barmode = "stack",
    fill_colours = c("KLEBSIELLA PNEUMONIAE" = "#007C91",
                     "STAPHYLOCOCCUS AUREUS" = "#8A1B61",
                     "PSEUDOMONAS AERUGINOSA" = "#FF7F32"),
    case_boxes = TRUE,              # Enable case boxes
    chart_title = "Laboratory Detections by Region \nand Species (Week 1, 2023)",
    chart_footer = "This chart has been created using simulated data.",
    x_axis_title = "Region",
    y_axis_title = "Number of detections",
    legend_title = "Organism species",
    x_axis_label_angle = -45
  )
)
```

**Interpretation**: The interactive chart with case boxes allows users to explore the data dynamically while highlighting specific data points of interest.

## Example 5: Column chart with threshold lines

Threshold lines help identify data points that exceed or fall below important cutoffs, such as outbreak levels or target values.

### Create the chart with threshold lines

```{r chunk-5, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE}
col_chart(
  dynamic = FALSE,  # Create static ggplot chart
  params = list(
    df = region_organism_summary,
    x = "region",
    y = "detections",
    group_var = "organism_species_name",
    group_var_barmode = "stack",
    fill_colours = c("KLEBSIELLA PNEUMONIAE" = "#007C91",
                     "STAPHYLOCOCCUS AUREUS" = "#8A1B61",
                     "PSEUDOMONAS AERUGINOSA" = "#FF7F32"),
    # Threshold lines
    hline = c(1000, 2000),          # Multiple threshold lines
    hline_colour = c("orange", "red"),  # Colors for each line
    hline_label = c("Alert level", "Outbreak threshold"),  # Labels for lines
    hline_label_colour = c("orange", "red"),  # Label colors
    hline_type = c("dashed", "solid"),  # Line types
    hline_width = c(1, 2),          # Line widths
    chart_title = "Laboratory Detections by Region \nand Species 2023",
    chart_footer = "This chart has been created using simulated data.",
    x_axis_title = "Region",
    y_axis_title = "Number of detections",
    legend_title = "Organism species",
    x_axis_label_angle = -45
  )
)
```

**Interpretation**: The threshold lines help identify regions that exceed alert levels (orange dashed line) or outbreak thresholds (red solid line), guiding public health response priorities.

## Example 6: Time-series column chart

Time-series column charts are crucial for surveillance, showing temporal patterns in disease occurrence. This example demonstrates weekly aggregation.

### Prepare the time-series data

```{r chunk-6, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE}
# Create weekly time series data
weekly_series <- epiviz::lab_data %>%
  filter(
    specimen_date >= as.Date("2023-01-01"),
    specimen_date <= as.Date("2023-03-31")
  ) %>%
  mutate(
    specimen_week = floor_date(specimen_date, "week", week_start = 1)  # Monday start
  ) %>%
  count(specimen_week, name = "detections")
```

### Create the time-series column chart

```{r chunk-7, fig.width=8, fig.height=6, eval=TRUE, echo=TRUE, fig.cap="Weekly laboratory detections between January and March 2023.", fig.alt="Column chart showing the number of laboratory detections for each ISO week between January and March 2023."}
col_chart(
  dynamic = FALSE,  # Create static ggplot chart
  params = list(
    df = weekly_series,
    x = "specimen_week",        # Date variable for x-axis
    y = "detections",           # Count variable
    x_time_series = TRUE,       # Indicate this is time series data
    time_period = "iso_year_week",  # Aggregation period
    fill_colours = "#007C91",
    chart_title = "Weekly laboratory detections (Q1 2023)",
    x_axis_title = "Week",
    y_axis_title = "Number of detections",
    x_axis_label_angle = -45,
    # Custom styling for time series
    x_axis_date_breaks = "2 weeks",  # Show every 2 weeks
    x_axis_date_labels = "%b %d"     # Format: Jan 01
  )
)
```

**Interpretation**: This time-series chart reveals weekly patterns in laboratory detections, helping identify trends, seasonal effects, and potential outbreaks.

## Tips for column charts

1. **Data aggregation**: Always aggregate your data appropriately before passing it to `col_chart()`. The function expects pre-calculated counts or values.

2. **Color mapping**: Use named color vectors for grouped data to ensure consistent colors across charts:
   ```r
   fill_colours = c("KLEBSIELLA PNEUMONIAE" = "#007C91",
                    "STAPHYLOCOCCUS AUREUS" = "#8A1B61")
   ```

3. **Grouping options**: 
   - `group_var_barmode = "stack"` for stacked bars (shows composition)
   - `group_var_barmode = "group"` for grouped bars (shows comparison)

4. **Bar labels**: Use `bar_labels` and `bar_labels_pos` to show exact values on bars:
   - `"bar_base"` - at the base of bars
   - `"bar_centre"` - at the center of bars
   - `"bar_top"` - at the top of bars

5. **Case boxes**: Enable `case_boxes = TRUE` for interactive charts to highlight specific data points.

6. **Threshold lines**: Use `hline` parameters to add horizontal reference lines for alert levels or targets.

7. **Time series**: When working with dates, set `x_time_series = TRUE` and specify the appropriate `time_period` for proper aggregation.

8. **Interactive features**: Set `dynamic = TRUE` for interactive charts with zooming, hovering, and filtering capabilities.

9. **Chart footers**: Add `chart_footer` to provide context about data sources or limitations.

10. **Label rotation**: Use `x_axis_label_angle = -45` for long category labels to improve readability.
