---
title: "Mining Causal Association Rules"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Mining Causal Association Rules}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

# Introduction

Standard association rules (or implications in Formal Concept Analysis) identify correlations between attributes ($A \to B$). However, correlation does not imply causation. A rule $A \to B$ might be strong simply because both $A$ and $B$ are caused by a third confounding variable $C$.

The `fcaR` package now supports **Mining Causal Association Rules**, implementing a method to identify likely causal relationships by controlling for confounding variables. This considers the "Fair Odds Ratio" calculated on a "Fair Data Set" of matched pairs.

```{r setup}
library(fcaR)
```

# The Approach

To check if $A \to B$ is causal, the algorithm:

1.  Identifies potential **confounders** (controlled variables) that are not part of the premise $A$, the conclusion $B$, or variables irrelevant to $B$.
2.  Constructs a **Fair Data Set** by finding **matched pairs** of objects. Two objects $(u, v)$ form a matched pair if:
    *   They have the same values for all controlled variables.
    *   One object has the premise ($u$ has property $A$).
    *   The other object does not ($v$ does not have property $A$).
3.  Computes the **Fair Odds Ratio** on these matched pairs.
4.  Considers the rule "Causal" if the lower bound of the Confidence Interval for the Fair Odds Ratio is greater than 1.

# Example 1: Direct Causality

Let's consider a simple case where **Treatment** causes **Recovery**.

```{r}
# 100 Patients
# 50 Treated, 50 Untreated
# Treated: 90% Recovery
# Untreated: 20% Recovery

n <- 100
treated <- c(rep(1, 45), rep(1, 5), rep(0, 10), rep(0, 40))
recovered <- c(rep(1, 45), rep(0, 5), rep(1, 10), rep(0, 40))

I <- matrix(c(treated, recovered), ncol = 2)
colnames(I) <- c("Treatment", "Recovery")

fc <- FormalContext$new(I)
```

We can mine for causal rules targeting "Recovery":

```{r}
rules <- fc$find_causal_rules(
    response_var = "Recovery",
    min_support = 0.1,
    confidence_level = 0.95
)

rules$print()
```

The algorithm correctly identifies "Treatment" as a cause for "Recovery".

# Example 2: Simpson's Paradox (Spurious Correlation)

A classic example where standard association rules fail is Simpson's Paradox, or confounding variables creating spurious correlations.

Consider a dataset relating **Ice Cream** consumption and **Drowning**. They are highly correlated because both increase during hot weather (the **Heat** variable).

*   **Heat** causes **Ice Cream**.
*   **Heat** causes **Drowning**.
*   **Ice Cream** does *not* cause **Drowning**.

However, a naive frequent itemset mining might find `Ice Cream -> Drowning`.

Let's simulate this:

```{r}
set.seed(123)
n <- 200
# Heat: 50% Hot, 50% Cold
heat <- c(rep(1, 100), rep(0, 100))

# Ice Cream: Strongly dependent on Heat (80% if Hot, 20% if Cold)
ic <- numeric(200)
ic[1:100] <- rbinom(100, 1, 0.8)
ic[101:200] <- rbinom(100, 1, 0.2)

# Drowning: Strongly dependent on Heat (80% if Hot, 20% if Cold)
drown <- numeric(200)
drown[1:100] <- rbinom(100, 1, 0.8)
drown[101:200] <- rbinom(100, 1, 0.2)

I <- matrix(c(heat, ic, drown), ncol = 3)
colnames(I) <- c("Heat", "IceCream", "Drowning")

fc_spurious <- FormalContext$new(I)
```

If we just looked at correlations, `IceCream` and `Drowning` would be correlated. But `find_causal_rules` controls for confounders.

When testing `IceCream -> Drowning`:
- It controls for `Heat`.
- It compares days with same Heat (Hot vs Hot, Cold vs Cold) but different Ice Cream consumption.
- Within "Hot" days, Ice Cream consumption is random (w.r.t Drowning causal mechanism) and doesn't increase drowning risk further.
- The odds ratio should be near 1.

```{r}
causal_rules <- fc_spurious$find_causal_rules(
    response_var = "Drowning",
    min_support = 0.5
)

# Should contain "Heat" but NOT "IceCream"
print(causal_rules)
```

As expected, the algorithm identifies **Heat** as the true cause and rejects the spurious **Ice Cream** association.

# Conclusion

The `find_causal_rules` method provides a powerful tool to go beyond simple association and identify rules that are robust to confounding, providing a step towards causal inference in Concept Analysis. It returns a `RuleSet` object with quality metrics including Support, Confidence, and the Fair Odds Ratio with its Confidence Interval.
