---
title: "Data Manipulation with fcaR and dplyr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data Manipulation with fcaR and dplyr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

Formal Concept Analysis (FCA) typically involves a workflow of cleaning data, extracting concepts, and analyzing implications. The **fcaR** package integrates seamlessly with **dplyr**, allowing you to use the "grammar of data manipulation" directly on `FormalContext` and `ImplicationSet` objects.

This integration provides S3 methods for:

* **FormalContext**: `select`, `filter`, `mutate`, `arrange`, `rename`.
* **ImplicationSet**: `filter`, `arrange`, `slice`.

## Setup

First, load the necessary packages and the example dataset `planets`.

```{r setup, message=FALSE, warning=FALSE}
library(fcaR)
library(dplyr)

data("planets")
# Create the initial Formal Context
fc <- FormalContext$new(planets)
```

## Part 1: Context Manipulation

Real-world data is rarely ready for FCA out of the box. You might need to derive new attributes, remove noise, or rename variables.

### 1.1 Renaming and Feature Engineering

We can use `rename()` to standardize attribute names and `mutate()` to create new attributes based on logic applied to existing ones. This is particularly powerful for **conceptual scaling** or creating higher-level abstractions.

```{r mutate_rename}
# Let's clean up the context
fc_clean <- fc %>% 
  rename(
    has_moon = moon,
    no_moon  = no_moon,
    is_large = large,
    is_small = small
  ) %>%
  mutate(
    # Create a new binary attribute 'giant_loner'
    # (A planet that is large but has no moon)
    giant_loner = is_large == 1 & no_moon == 1,
    
    # Create 'extreme_size' (either small or large)
    extreme_size = is_small == 1 | is_large == 1
  )

# Check the new attributes
print(fc_clean$attributes)
```

### 1.2 Filtering and Selecting

We can filter the *objects* (rows) and select *attributes* (columns) to focus our analysis on a specific subset of the domain.

```{r filter_select}
# Focus only on 'extreme' sized planets and keep specific attributes
fc_focused <- fc_clean %>%
  filter(extreme_size == 1) %>% 
  select(has_moon, giant_loner, is_large)

fc_focused$print()
```

## Part 2: Mining and Filtering Implications

Once the context is clean, we extract the implications (association rules).

```{r mining}
# We use the original context for more results
fc$find_implications()
rules <- fc$implications

cat("Total rules found:", rules$cardinality(), "\n")
```

### 2.1 Filtering by Metrics

You can use standard `dplyr` verbs to filter rules based on their quality measures: `support`, `lhs_size` (number of attributes in the premise), `rhs_size`, and `size`.

```{r filter_metrics}
# Get strong rules (support > 0.2) that are not trivial (size > 2)
strong_rules <- rules %>% 
  filter(support > 0.2, size > 2) %>% 
  arrange(desc(support))

strong_rules$print()
```

### 2.2 Semantic Filtering

Often, you are looking for rules that involve specific attributes (e.g., "What implies having a moon?"). `fcaR` provides special helper functions available only inside `filter()`:

* `lhs("A")` / `lhs_has("A")`: The Left-Hand Side MUST contain "A".
* `rhs("B")` / `rhs_has("B")`: The Right-Hand Side MUST contain "B".
* `not_lhs("C")`: The LHS must NOT contain "C".
* `lhs_any("A", "B")`: The LHS must contain either "A" or "B".

```{r filter_semantic}
# Find rules that imply 'moon'
moon_rules <- rules %>% 
  filter(rhs("moon"))

cat("Rules implying 'moon':\n")
moon_rules$print()
```

### 2.3 Complex Pipelines

You can combine metrics, semantic logic, sorting, and slicing in a single pipeline. This allows for very specific queries like:

> *"Find me the top 3 most supported rules about large planets that do not concern distance."

```{r complex_query}
specific_rules <- rules %>% 
  filter(
    lhs("large"),     # Must be about large planets
    not_lhs("far"),    # Ignore far planets
    support >= 0.2    # Minimum support threshold
  ) %>% 
  arrange(desc(support)) %>% 
  slice(1:3)          # Take the top 3

specific_rules$print()
```

## Conclusion

The integration of `dplyr` into `fcaR` allows for a fluid, readable, and powerful workflow. You can clean your contexts and query your rule sets using the same tidy syntax you use for standard data frames.
