---
title: SVD wrapper, PCA and the bi_projector
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{SVD wrapper, PCA and the bi_projector}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
params:
  family: red
css: albers.css
resource_files:
- albers.css
- albers.js
includes:
  in_header: |-
    <script src="albers.js"></script>
    <script>document.addEventListener('DOMContentLoaded',()=>document.body.classList.add('palette-red'));</script>

---

```{r setup, include=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4)
library(multivarious)  # Ensure pca(), svd_wrapper(), bi_projector, etc. are available
library(ggplot2)       # for plots below
```

# 1. Why wrap SVD at all?

There are six popular SVD engines in R (base::svd, corpcor, RSpectra,
irlba, rsvd, svd (PROPACK)) – each with its own argument list,
naming conventions and edge-cases (some refuse to return the full rank,
others crash on tall-skinny matrices).

`svd_wrapper()` smooths that out:

*   identical call-signature no matter the backend,
*   automatic pre-processing (centre / standardise) via the same pipeline
    interface shown in the previous vignette,
*   returns a `bi_projector` – an S3 class that stores loadings `v`,
    scores `s`, singular values `sdev` plus the fitted pre-processor.

That means immediate access to verbs such as `project()`,
`reconstruct()`, `truncate()`, `partial_project()`.

```{r svd_wrapper_example}
set.seed(1)
X <- matrix(rnorm(35*10), 35, 10)   # 35 obs × 10 vars

sv_fast <- svd_wrapper(X, ncomp = 5, preproc = center(), method = "fast")

# irlba backend (if installed) gives identical results
sv_irlba <- if (requireNamespace("irlba", quietly = TRUE)) {
  suppressWarnings(svd_wrapper(X, ncomp = 5, preproc = center(), method = "irlba"))
}

# Same downstream code works for both objects:
head(scores(sv_fast)) # 35 × 5

if (!is.null(sv_irlba)) {
  all.equal(scores(sv_fast), scores(sv_irlba))
}
```

# 2. A one-liner `pca()`

Most people really want PCA, so `pca()` is a thin wrapper that

1.  calls `svd_wrapper()` with sane defaults,
2.  adds the S3 class "pca" (printing, screeplot, biplot, permutation test, …).

```{r pca_example}
data(iris)
X_iris <- as.matrix(iris[, 1:4])

pca_fit <- pca(X_iris, ncomp = 4)    # defaults to method = "fast", preproc=center()
print(pca_fit)
```

## 2.1 Scree-plot and cumulative variance

```{r pca_screeplot}
screeplot(pca_fit, type = "lines", main = "Iris PCA – scree plot")
```

## 2.2 Quick biplot

```{r pca_biplot, warning=FALSE, message=FALSE}
# Requires ggrepel for repulsion, but works without it
biplot(pca_fit, repel_points = TRUE, repel_vars = TRUE)
```

(If you do not have ggrepel installed the text is placed without repulsion.)

# 3. What is a `bi_projector`?

Think bidirectional mapping:

```
data space  (p variables)  ↔  component space  (d ≤ p)
        new samples:  project()        ← scores
       new variables: project_vars()   ← loadings
                     reconstruction ↔  (scores %*% t(loadings))
```

A `bi_projector` therefore carries

| slot      | shape | description                                        |
|-----------|-------|----------------------------------------------------|
| `v`       | p × d | component loadings (columns)                       |
| `s`       | n × d | score matrix (rows = observations)                 |
| `sdev`    | d     | singular values (or SDs related to components) |
| `preproc` | –     | fitted transformer so you never leak training stats |

Because `pca()` returns a `bi_projector`, you get other methods for free:

```{r biprojector_methods}
# rank-2 reconstruction of the iris data
Xhat2 <- reconstruct(pca_fit, comp = 1:2)
print(paste("MSE (rank 2):", round(mean((X_iris - Xhat2)^2), 4))) # MSE ~ 0.076

# drop to 2 PCs everywhere
pca2 <- truncate(pca_fit, 2)
shape(pca2)            # 4 vars × 2 comps
```

# 4. Fast code-coverage cameo

The next chunk quietly touches a few more branches used in the unit tests
(`std_scores()`, `perm_test()`, `rotate()`), but keeps printing to a minimum:

```{r code_coverage}
# std scores
head(std_scores(svd_wrapper(X, ncomp = 3))) # Use the earlier X data

# tiny permutation test (10 perms; obviously too few for science)
# This requires perm_test.pca method
# Make sure X_iris is centered if perm_test needs centered data
perm_res <- perm_test(pca_fit, X_iris, nperm = 10, comps = 2)
print(perm_res$component_results)

# quick varimax rotation
if (requireNamespace("GPArotation", quietly = TRUE)) {
  pca_rotated <- rotate(pca_fit, ncomp = 3, type = "varimax")
  print(pca_rotated)
} else {
  cat("GPArotation not installed, skipping rotation example.\n")
}
```

(Running these once in the vignette means they are also executed by `R CMD check`, bumping test-coverage without extra scaffolding.)

# 5. Take-aways

*   `svd_wrapper()` gives you a unified front end to half-a-dozen SVD engines.
*   `pca()` piggy-backs on that, returning a fully featured `bi_projector`.
*   The `bi_projector` contract means the same verbs & plotting utilities
    work for any decomposition you wrap into the framework later.

---

# Session info

```{r session_info_svd}
sessionInfo()
``` 
