---
title: "Getting started with agriReg"
author: "agriReg Authors"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    fig_width: 7
    fig_height: 4.5
vignette: >
  %\VignetteIndexEntry{Getting started with agriReg}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.align = "centre",
  warning   = FALSE,
  message   = FALSE
)
library(agriReg)
```

## Overview

`agriReg` provides a streamlined workflow for the most common regression
tasks in agricultural research:

| Task | Function |
|------|----------|
| Clean field data | `clean_agri_data()` |
| Normalise yield | `yield_normalize()` |
| Flag outliers | `outlier_flag()` |
| Linear / mixed model | `fit_linear()` |
| Polynomial selection | `fit_polynomial()` |
| Nonlinear growth curves | `fit_nonlinear()` |
| Optimum dose / rate | `optimum_dose()` |
| Dose-response (LL.4/5) | `fit_dose_response()` |
| Effective dose (ED50…) | `ed_estimates()` |
| Compare models | `compare_models()` |
| Goodness-of-fit stats | `gof_stats()` |
| Residual diagnostics | `residual_check()` |

---

## 1 · Data preparation

Load one of the bundled example datasets and inspect it.

```{r load-data}
wheat <- load_example_data("wheat_trial")
head(wheat)
```

### Cleaning and outlier detection

```{r clean}
wheat_clean <- clean_agri_data(wheat,
                                yield_col     = "yield",
                                flag_outliers = TRUE)
summary(wheat_clean)
```

### Normalisation

Normalise yield within each block using z-scoring:

```{r normalise}
wheat_norm <- yield_normalize(wheat_clean,
                               yield_col = "yield",
                               method    = "zscore",
                               group_by  = "block")
head(wheat_norm[, c("block", "yield", "yield_norm")])
```

### Advanced outlier flagging

The standalone `outlier_flag()` supports three methods:

```{r outlier}
wheat_clean <- outlier_flag(wheat_clean, col = "yield", method = "modz")
table(wheat_clean$yield_outlier)
```

---

## 2 · Linear models

### Simple OLS

```{r lm-simple}
m_lm <- fit_linear(wheat_clean, "yield ~ nitrogen + phosphorus")
print(m_lm)
```

### Diagnostic plots

The `plot()` method returns `ggplot2` objects — pipe them, save them, or
patch them together with `patchwork`.

```{r lm-plots, fig.height=4}
plot(m_lm, type = "fit")
plot(m_lm, type = "residuals")
plot(m_lm, type = "qq")
```

### Full residual panel

```{r residual-panel, fig.height=6}
residual_check(m_lm)
```

### Mixed-effects model

Add a random intercept per block to account for field heterogeneity:

```{r lmer}
m_lmer <- fit_linear(wheat_clean,
                     formula = "yield ~ nitrogen + phosphorus + variety",
                     random  = "(1|block)")
summary(m_lmer)
```

### Polynomial selection

Automatically test degrees 1–3 and return the best-fitting model:

```{r poly}
m_poly <- fit_polynomial(wheat_clean,
                          x_col      = "nitrogen",
                          y_col      = "yield",
                          max_degree = 3)
coef(m_poly)
```

---

## 3 · Nonlinear models

Load the maize growth dataset:

```{r maize-load}
maize <- load_example_data("maize_growth")
# Use well-watered plants only
maize_ww <- maize[maize$treatment == "WW", ]
head(maize_ww)
```

### Logistic growth

The 3-parameter logistic is the most common growth curve in agronomy.
Parameter interpretation:

- **Asym**: upper asymptote (maximum biomass)
- **xmid**: inflection point (day of maximum growth rate)
- **scal**: scale parameter (related to the steepness)

```{r logistic}
m_log <- fit_nonlinear(maize_ww,
                        x_col = "days",
                        y_col = "biomass_g",
                        model = "logistic")
summary(m_log)
```

```{r logistic-plot, fig.height=4}
plot(m_log)
```

### Gompertz curve

The Gompertz is often a better fit for biological growth that is
asymmetric around the inflection point:

```{r gompertz, eval = FALSE}
m_gomp <- fit_nonlinear(maize_ww, "days", "biomass_g", "gompertz")
coef(m_gomp)
```

### Asymptotic (Mitscherlich) model

Classic model for nutrient-response data where yield approaches a plateau:

```{r asymptotic}
m_asym <- fit_nonlinear(wheat_clean,
                         x_col = "nitrogen",
                         y_col = "yield",
                         model = "asymptotic")
plot(m_asym)
```

### Quadratic and optimum dose

The quadratic polynomial is widely used to find the economic optimum
nitrogen rate:

```{r quadratic}
m_quad <- fit_nonlinear(wheat_clean,
                         x_col = "nitrogen",
                         y_col = "yield",
                         model = "quadratic")

opt <- optimum_dose(m_quad)
cat(sprintf("Optimum nitrogen rate : %.1f kg/ha\n", opt["x_opt"]))
cat(sprintf("Predicted max yield   : %.2f t/ha\n",  opt["y_max"]))
```

### Linear-plateau model

Suitable when yield increases linearly up to a critical input level and
then plateaus (common for phosphorus response):

```{r lp}
m_lp <- fit_nonlinear(wheat_clean,
                       x_col = "nitrogen",
                       y_col = "yield",
                       model = "linear_plateau")
cat("Critical point (plateau onset):", optimum_dose(m_lp)["x_opt"], "kg/ha\n")
```

---

## 4 · Dose-response models

Load the herbicide dataset:

```{r herb-load}
herb <- load_example_data("herbicide_trial")
# Subset to one species
amaranth <- herb[herb$species == "Amaranth", ]
```

### Fit a 4-parameter log-logistic curve

The LL.4 model is the industry standard for herbicide dose-response:

$$y = c + \frac{d - c}{1 + \exp(b(\log(x) - \log(e)))}$$

where *b* = slope, *c* = lower asymptote, *d* = upper asymptote,
*e* = ED50.

```{r drc}
m_dr <- fit_dose_response(amaranth,
                           dose_col = "dose_g_ha",
                           resp_col = "weed_control_pct",
                           fct      = "LL.4")
summary(m_dr)
```

```{r drc-plot, fig.height=4}
plot(m_dr, log_dose = TRUE)
```

### Effective dose estimates

```{r ed}
ed_estimates(m_dr, respLev = c(10, 50, 90))
```

---

## 5 · Model comparison

Compare all models fitted to the wheat data:

```{r compare}
cmp <- compare_models(
  ols        = m_lm,
  mixed      = m_lmer,
  asymptotic = m_asym,
  quadratic  = m_quad,
  lp         = m_lp
)
print(cmp)
```

The table is ranked by AIC. `delta_AIC` shows the difference from the best
model — values below 2 are considered equivalent; above 10 is strong
evidence against the weaker model.

---

## 6 · Extracting results for reports

All `agriReg` objects are compatible with `broom` for tidy output:

```{r broom}
library(broom)
tidy(m_lm$fit)
glance(m_lm$fit)
```

### Saving plots

```{r save-plot, eval=FALSE}
p <- plot(m_log)
ggplot2::ggsave("logistic_growth.png", p, width = 7, height = 4.5, dpi = 300)
```

---

## Session information

```{r session}
sessionInfo()
```