---
title: "Example : emhawkes package"
author: "Kyungsub Lee"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteEncoding{UTF-8}
  %\VignetteIndexEntry{Example : emhawkes package}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  markdown: 
    wrap: sentence
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

```

## Basic Hawkes model

### Univariate Hawkes process

```{r}
library(emhawkes)
```

This subsection outlines the steps for constructing, running simulations, and estimating a univariate Hawkes model.
To begin, create an `hspec` object, which defines the Hawkes model.
The S4 class `hspec` contains slots for the model parameters: `mu`, `alpha`, `beta`, `dimens`, `rmark`, and `impact`.

In a univariate model, the basic parameters of the model---`mu`, `alpha`, and `beta`---can be given as numeric values.
If numeric values are provided, they will be converted to matrices.
Below is an example of a univariate Hawkes model without a mark.

```{r}
mu1 <- 0.3; alpha1 <- 1.2; beta1 <- 1.5
hspec1 <- new("hspec", mu = mu1, alpha = alpha1, beta = beta1)
show(hspec1)
```

The function `hsim` implements simulation where the input arguments are `hspec`, `size`, and the initial values of the intensity component process, `lambda_component0`, and the initial values of the Hawkes processes, `N0`.
More precisely, the intensity process of the basic univariate Hawkes model is represented by

$$
\lambda(t) = \mu + \int_{-\infty}^t \alpha e^{-\beta (t-s)} d N(s) = \mu + \lambda_c(0) e^{-\beta t} + \int_0^t \alpha e^{-\beta (t-s)} d N(s)
$$

where the `lambda_component0` denotes

$$
\lambda_c(0) = \int_{-\infty}^0 \alpha e^{\beta s} d N(s).
$$

If `lambda_component0` is not provided, the internally determined initial values for the intensity process are used.
If `size` is sufficiently large, the exact value of `lambda_component0` may not be important.
The default initial value of the counting process, `N0`, is zero.

```{r, warning=FALSE}
set.seed(1107)
res1 <- hsim(hspec1, size = 1000)
summary(res1)
```

The results of `hsim` is an S3 class `hreal`, which consists of `hspec`, `inter_arrival`, `arrival`, `type`, `mark`, `N`, `Nc`, `lambda`, `lambda_component`, `rambda`, `rambda_component`.

-   `hspec` is the model specification.

-   `inter_arrival` is the inter-arrival time of every event.

-   `arrival` is the cumulative sum of `inter_arrival`.

-   `type` is the type of events, i.e., $i$ for $N_i$, and is used for a multivariate model.

-   `mark` is a numeric vector that represents additional information for the event.

-   `lambda` represents $\lambda$, which is the left continuous and right limit version.

-   The right continuous version of intensity is `rambda`.

-   `lambda_component` represents $\lambda_{ij}$, and `rambda_component` is the right continuous version.

`inter_arrival`, `type`, `mark`, `N`, and `Nc` start at zero.
Using the `summary()` function, one can print the first 20 elements of `arrival`, `N`, and `lambda`.
The `print()` function can also be used.

By definition, we have `lambda == mu + lambda_component`.

```{r}
# first and third columns are the same
cbind(res1$lambda[1:5], res1$lambda_component[1:5], mu1 + res1$lambda_component[1:5])
```

For all rows except the first, `rambda` equals `lambda + alpha` in this model.

```{r}
# second and third columns are the same
cbind(res1$lambda[1:5], res1$rambda[1:5], res1$lambda[1:5] + alpha1)
```

Additionally, verify that the exponential decay is accurately represented in the model.

```{r}
# By definition, the following two are equal:
res1$lambda[2:6]
mu1 + (res1$rambda[1:5] - mu1) * exp(-beta1 * res1$inter_arrival[2:6])
```

The log-likelihood function is calculated using the `logLik` method.
In this context, the inter-arrival times and `hspec` are provided as inputs to the function.

```{r, warning=FALSE}
logLik(hspec1, inter_arrival = res1$inter_arrival)
```

The likelihood estimation is performed using the `hfit` function.
The specification of the initial parameter values, `hspec0`, is required.
Note that only `inter_arrival` is needed for this univariate model.
For more accurate simulation, it is recommended to specify `lambda0`, the initial value of the lambda component.
If `lambda0` is not provided, the function uses internally determined initial values.
By default, the BFGS method is employed for numerical optimization.

```{r, warning=FALSE}
# initial value for numerical optimization
mu0 <- 0.5; alpha0 <- 1.0; beta0 <- 1.8
hspec0 <- new("hspec", mu = mu0, alpha = alpha0, beta = beta0)
# the intial values are provided through hspec
mle <- hfit(hspec0, inter_arrival = res1$inter_arrival)
summary(mle)
```

### Bivariate Hawkes model

The intensity process of a basic bivariate Hawkes model is defined by

$$
 \lambda_1(t) = \mu_1 + \int_{-\infty}^t \alpha_{11} e^{-\beta_{11}(t-s)} d N_1(s) + \int_{-\infty}^t \alpha_{12} e^{-\beta_{12}(t-s)} d N_2(s),
$$

$$
 \lambda_2(t) = \mu_2 + \int_{-\infty}^t \alpha_{21} e^{-\beta_{21}(t-s)} d N_1(s) + \int_{-\infty}^t \alpha_{22} e^{-\beta_{22}(t-s)} d N_2(s).
$$

In a bivariate model, the parameters within the slots of `hspec` are matrices.
Specifically, `mu` is a 2-by-1 matrix, while `alpha` and `beta` are 2-by-2 matrices.

$$
\mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \quad
\alpha = \begin{bmatrix} \alpha_{11} & \alpha_{12} \\ \alpha_{21} & \alpha_{22} \end{bmatrix}, \quad
\beta = 
\begin{bmatrix} \beta_{11} & \beta_{12} \\ \beta_{21} & \beta_{22} \end{bmatrix}
$$

`rmark` is a random number generating function for marks and is not used in non-mark models.
`lambda_component0` is a 2-by-2 matrix that represents the initial values of `lambda_component`, which includes the set of values `lambda11`, `lambda12`, `lambda21`, and `lambda22`.
The intensity processes are represented by

$$ \lambda_1(t) = \mu_1 + \lambda_{11}(t) + \lambda_{12}(t), $$

$$ \lambda_2(t) = \mu_2 + \lambda_{21}(t) + \lambda_{22}(t). $$

The terms $\lambda_{ij}$ are referred to as lambda components, and `lambda0` represents \$\lambda\_{ij}(0)`.  The parameter`lambda_component0\` can be omitted in this model, in which case internally determined initial values will be used.

```{r}
mu2 <- matrix(c(0.2), nrow = 2)
alpha2 <- matrix(c(0.5, 0.9, 0.9, 0.5), nrow = 2, byrow = TRUE)
beta2 <- matrix(c(2.25, 2.25, 2.25, 2.25), nrow = 2, byrow = TRUE)
hspec2 <- new("hspec", mu=mu2, alpha=alpha2, beta=beta2)
print(hspec2)
```

To perform a simulation, use the `hsim` function.

```{r}
set.seed(1107)
res2 <- hsim(hspec2,  size=1000)
summary(res2)
```

In multivariate models, `type` is crucial as it represents the type of event.

```{r}
# Under bi-variate model, there are two types, 1 or 2.
res2$type[1:10]
```

In multivariate models, the column names of `N` are `N1`, `N2`, `N3`, and so on.

```{r}
res2$N[1:3, ]
```

Similarly, the column names of `lambda` are `lambda1`, `lambda2`, `lambda3`, and so on.

```{r}
res2$lambda[1:3, ]
```

The column names of `lambda_component` are `lambda_component11`, `lambda_component12`, `lambda_component13`, and so on.

```{r}
res2$lambda_component[1:3, ]
```

By definition, the following two expressions are equivalent:

```{r}
mu2[1] + rowSums(res2$lambda_component[1:5, c("lambda11", "lambda12")])
res2$lambda[1:5, "lambda1"]
```

From the results, we obtain vectors of realized `inter_arrival` and `type`.
A bivariate model requires both `inter_arrival` and `type` for estimation.

```{r}
inter_arrival2 <- res2$inter_arrival
type2 <- res2$type
```

The log-likelihood is computed using the `logLik` function.

```{r}
logLik(hspec2, inter_arrival = inter_arrival2, type = type2)
```

Maximum log-likelihood estimation is performed using the `hfit` function.
In this process, the parameter values in `hspec0`, such as `mu`, `alpha`, and `beta`, serve as starting points for the numerical optimization.
For illustration purposes, we set `hspec0 <- hspec2`.
Since the true parameter values are unknown in practical applications, these initial guesses are used.
The realized `inter_arrival` and `type` data are utilized for estimation.

```{r, warning=FALSE}
hspec0 <- hspec2
mle <- hfit(hspec0, inter_arrival = inter_arrival2, type = type2)
summary(mle)
```

```{r}
coef(mle)
```

```{r}
miscTools::stdEr(mle)
```


Also see the [extended vignette on GitHub](https://ksublee.github.io/emhawkes/articles/extended_example.html).
