---
title: "Calculating Similarity Factor $f_2$"
author: "Zhengguo XU"
date: "`r format(Sys.Date())`"
output: 
  rmarkdown::html_vignette:
    keep_md: true
    toc: false
    toc_depth: 3
    fig_width: 7
    fig_height: 4.5
  highlight: "tango"
bibliography: ref.bib
notes-after-punctuation: false
link-citations: yes
csl: ref.csl
vignette: >
  %\VignetteIndexEntry{Calculating Similarity Factor $f_2$}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup0, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#"
)
```

```{r setup, echo=FALSE}
library(bootf2)
```


## Introduction

To use traditional $f_2$ method, several conditions defined in the regulatory
guidelines have to be fulfilled. Different guidelines phrase the conditions
differently, which led to confusion some times. The details were explained in
the vignette *Introduction to bootf2*. 

This document gives some examples of the usage of the function `calcf2()`. 

## Usage

The complete list of arguments of the function is as follows:

```{r calcf2-code0, eval=FALSE}
calcf2(test, ref, path.in, file.in, path.out, file.out,
       regulation = c("EMA", "FDA", "WHO", "Canada", "ANVISA"),
       cv.rule = TRUE, message = FALSE, min.points = 3L,
       f2.type = c("est.f2", "exp.f2", "bc.f2", "vc.exp.f2",
                   "vc.bc.f2", "all"), both.TR.85 = FALSE,
       digits = 2L, time.unit = c("min", "h"),  plot = TRUE,
       plot.start.time = 0, plot.max.unit = 24L)
```

### Notes on function arguments

1. Data input: `test`, `ref`, `path.in`, `file.in`
    - In the typical interactive use, `test` and `ref` are *data frames* with
      the time as the first column, and individual units for the rest of 
      columns. In such cases, arguments `path.in` and `file.in` should not 
      be used.
    - Data can be read directly from an Excel file with extension `.xlsx` or
      `.xls`. In this case, data of test and reference should be stored in
      separate worksheets. The first column should be time, the rest columns 
      are individual units. *The first row* is the column head indicating the
      names, such as 'time', 'unit 01', unit 02', .... It doesn't matter what
      the names are as columns will be renamed internally by the function. The
      important point is that *the first row will not be read, so do not put
      dissolution data on the first row*. 
    - When `path.in` and `file.in` are provided, the argument `test` and
      `ref` should be the *worksheet names inside quotation mark*, e.g., 
      `"lot ABCD1234 pH 6.8"`. 
    - `path.in` can be an absolute path such as`"/home/myname/my.project/dat/"`,
      or a relative path such as `"../dat/"` if the working directory is in the
      folder `"/home/myname/my.project/analysis/"` and the data file is in the
      folder `"/home/myname/my.project/dat/"`. 
    - One more note for Windows user: As Windows use "\" instead of "/" to
      separate path, you have to either escape it by an additional "\", 
      e.g., `"C:\user\myname\my.project\dat\"` *cannot* be the `path.in`, 
      you have to changed it to `"C:\\user\\myname\\my.project\\dat\\"`, or
      to `"C:/user/myname/my.project/dat/"`, the same format as used in Linux
      system.
1. For such simple calculation, argument `path.out` and `file.out` are
   really overkill. By default the result will be printed to screen. But if 
   somehow you feel that you need it, the same principle for `path.in` and 
   `file.in` applies here.
1. The $f_2$ value should be reported without decimal.
1. When argument `cv.rule` is set to `TRUE`, and if argument `message` is also 
   `TRUE`, additional message will be printed to screen. One particular note is
   the *rounding of CV before checking compliance*. Since the guidelines does
   not specify the precision for CV, when checking if CV condition defined in
   the guidelines is fulfilled, the calculated CV values are rounded with
   `digits` decimal first before they were compared to the 20%/10% criterion.
   And refer to vignette *Introduction to bootf2* for details of regulation
   rules. 
1. In terms of how many time points to be used in the calculation (especially
   those points with dissolution more than 85%), the `"EMA"` and `"FDA"` 
   are the same. See vignette *Introduction to bootf2* for details. The reason
   both of them are included in `regulation` argument is because they have
   different rules for CV.
1. If the argument `both.TR.85 = TRUE`, *and* `regulation = FDA`, then the
   function will calculate $f_2$ using all time points until both test 
   and reference dissolve more than 85%. This is the **wrong interpretation**
   of the guidelines, as explained in the vignette *Introduction to bootf2*. 
   The purpose of this argument is for historical reason, in case users want
   to check the calculation published in the old articles where the wrong
   interpretation was used, or for validation purpose. It should be set to
   `FALSE` for the daily analysis.
1. For traditional $f_2$ calculation, do not specify the argument `f2.type`.
   This argument is only for bootstrap method. The default `est.f2` is the
   correct one to use. See vignette *Confidence Intervals of $f_2$ Using
   Bootstrap Method* for details.
1. The argument `plot.max.unit` control how individual profile will be 
   represented in the plot. When the actual number of units is greater than 
   the value of `plot.max.unit`, the individual profile will be represented 
   as boxplots at each time points. 
1. For delayed-release formulation, it is typical that the product will be 
   put in acidic media for 2 hours where almost no dissolution will occur, 
   then transferred to different medium (e.g., pH 6.8 medium). In such special
   case, setting `plot.start.time = 120` (or `plot.start.time = 2` if the time
   unit is `"h"`) for example, can make plots more readable.
1. Read the function manual by `help("calcf2")` for more details on
   each argument.

## Examples

## Simple case with low variability

First, let's simulate profiles with low variability for reference and test.
```{r calcf2-cvok01-dat}
# time points
tp <- c(5, 10, 15, 20, 30, 45, 60)

# model.par for reference with low variability
par.r1.lv <- list(fmax = 100, fmax.cv = 3, mdt = 15, mdt.cv = 13, 
                  tlag = 0, tlag.cv = 0, beta = 1.5, beta.cv = 7)

# simulate reference data
dr1.lv <- sim.dp(tp, model.par = par.r1.lv, seed = 100, plot = FALSE,
                 message = TRUE)

# model.par for test 
par.t1.lv <- list(fmax = 100, fmax.cv = 3, mdt = 12.29, mdt.cv = 10,
                  tlag = 0, tlag.cv = 0, beta = 1.727, beta.cv = 8)

# simulate test data with low variability
dt1.lv <- sim.dp(tp, model.par = par.t1.lv, seed = 100, plot = FALSE,
                 message = TRUE)
```

Calculate f2 with default setting (following EMA's guideline)
```{r calcf2-cvok01-a, error=TRUE}
t_vs_r_ema <- calcf2(dt1.lv$sim.disso, dr1.lv$sim.disso,
                     message = TRUE)
```

calculate f2 following WHO guideline
```{r calcf2-cvok01-b, error=TRUE}
t_vs_r_who <- calcf2(dt1.lv$sim.disso, dr1.lv$sim.disso, 
                     regulation = "WHO", message = TRUE)
```


Calculate f2 following FDA guidance, to confirm "old calculation" with
*wrong interpretation*. See vignette *Introduction to bootf2* for details.
```{r calcf2-cvok01-c}
t_vs_r <- calcf2(dt1.lv$sim.disso, dr1.lv$sim.disso, regulation = "FDA",
                 both.TR.85 = TRUE, message = TRUE, plot = FALSE)
```

# Cases with high variability
Simulate profiles with the same population parameters but with higher
variability.

```{r calcf2-cvko-dat}
# model.par for reference with high variability
par.r1.hv <- list(fmax = 100, fmax.cv = 3, mdt = 15, mdt.cv = 20, 
                  tlag = 0, tlag.cv = 0, beta = 1.5, beta.cv = 10)

# simulate reference data
dr1.hv <- sim.dp(tp, model.par = par.r1.hv, seed = 100, plot = FALSE)

# model.par for test 
par.t1.hv <- list(fmax = 100, fmax.cv = 3, mdt = 12.29, mdt.cv = 15,
                  tlag = 0, tlag.cv = 0, beta = 1.727, beta.cv = 12)

# simulate test data with low variability
dt1.hv <- sim.dp(tp, model.par = par.t1.hv, seed = 100, plot = FALSE)
```

When CV conditions are not fulfilled, the function will stop running. 
```{r calcf2-cvko-a, error=TRUE}
calcf2(dt1.hv$sim.disso, dr1.hv$sim.disso, message = TRUE)
```

There might be a time that you just want to check the $f_2$ regardless of the
variability, option `cv.rule` can be set to `FALSE` in such case.
```{r calcf2-cvko-b, error=TRUE}
dt1.hv_vs_dr1.hv <- calcf2(dt1.hv$sim.disso, dr1.hv$sim.disso,
                           cv.rule = FALSE, message = TRUE)
```

