---
title: "setweaver"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{setweaver}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/vignette_",
  out.width = "100%"
)
```

*setweaver* is an R package designed to help users create sets of variables based on a mutual information approach and explore how they are related to a specific outcome. In this context, a set is a collection of distinct elements (e.g., variables) that can also be treated as a single entity. Mutual information, a concept from probability theory, quantifies the dependence between two variables by expressing how much information about one variable can be gained from observing the other.

## Authors
[Aaron Fisher](https://psychology.berkeley.edu/people/aaron-fisher)\
[Nicolas Leenaerts](https://nicolasleenaerts.github.io/)

## Installation

You can install the released version of *setweaver* from [CRAN](https://CRAN.R-project.org) with:

```r
install.packages("setweaver")
```

Or you can install the development version of *setweaver* from GitHub with the following code snippet:

``` r
devtools::install_github('nicolasleenaerts/setweaver')
```

You can then attach the package as follows:

```{r setup}
library(setweaver)
```

## Pairing variables

You can create sets of variables using the *pairmi* function, which takes a dataframe of variables and pairs them up to a specified maximum number of elements. For each set, the mutual information between the variables is computed, followed by the calculation of a G-statistic. This statistic is then evaluated for significance based on a chi-squared distribution with a predefined alpha level. Alternatively, users can specify a mutual information threshold to determine the significance of the sets.

```{r example_1, results='hide',message=FALSE}
# Loading the package, which automatically also downloads the example data (misimdata)
library(setweaver) 

# Pairing variables
results = pairmi(misimdata[,2:11],alpha = 0.05,n_elements = 5)
```

```{r table_1,echo=FALSE,results='asis'}
knitr::kable(results$expanded.data[c(1:5),],caption = 'Table 1. Expanded Data',align = c('c'))
```

```{r table_2,echo=FALSE,results='asis'}
knitr::kable(results$sets,caption = 'Table 2. Information on sets',align = c('c'))
```

## Evaluating sets

Once the sets are created with the *pairmi* function , you can assess their relationship with a specific outcome using the *probstat* function. This function employs k-fold cross-validation to compute parameters such as conditional probability, conditional entropy, and the odds ratio of the outcome given a particular set. Additionally, a Fisher's exact test or a generalized linear mixed model (i.e., for multilevel data) is performed to determine whether the outcome is significantly more likely to occur in the presence of a given set of variables.

```{r example_2, results='hide',message=FALSE}
# Evaluating the sets
evaluated_sets = probstat(misimdata$y,results$expanded.data[,results$sets$set],nfolds = 5)
```

```{r table_3,echo=FALSE,results='asis'}
knitr::kable(evaluated_sets[c(1:5),],caption = 'Table 3. Evaluated sets',align = c('c'))
```

## Visualizing sets

You can visualize the sets created with the *pairmi* function using the *setmapmi* function. This function generates a setmap, which illustrates the composition of sets by showing which original variables are included in sets of a given size.

```{r example_3, fig.align = "center", fig.height = 6, fig.width =8, fig.cap="Plot 1. Setmap of sets that consist of 2 elements"}
# Visualizing the sets
setmapmi(results$original.variables,results$sets,n_elements = 2)
```

## Visualizing relations between sets and an outcome

You can also visualise how sets are related to an outcome with the *plot_prob* function. Here, the relationships can displayed either as conditional probabilities or as effects estimated by logistic regression.

```{r example_4, fig.align = "center", fig.height = 6, fig.width = 6, fig.cap="Plot 2. Graph showing the relation between certain sets and an outcome y"}
# Creating a graph where sets are relate to an outcome using logistic regression effects
plot_prob(cbind(y=misimdata[,1],results$expanded.data[,13:17]),
          'y',colnames(results$expanded.data[,13:17]),method='logistic')
```

## Working Directly with Underlying Functions

If you wish to explore the relationships between variables using a probabilistic or mutual information framework, you can call the lower-level functions from the *pairmi* and *probstat* functions directly. This allows for detailed and customized analyses. For example, the *entfuns* function calculates several descriptive measures that summarize the relationships between predictor variables and an outcome variable.

```{r example_5, results='hide',message=FALSE}
# Compute entropy and mutual information diagnostics for selected variables
descriptives = entfuns(misimdata$y,misimdata[,2:3])
```

```{r table_4,echo=FALSE,results='asis'}
knitr::kable(entfuns(misimdata$y,misimdata[,2:3]),caption = 'Table 4. Diagnostic statistics from entfuns()',align = c('c'))
```

Enjoy using the package, and reach out if you have any questions!