---
title: "API and Database Reference"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{API and Database Reference}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Overview

metasurvey provides a REST API built with [plumber](https://www.rplumber.io/)
backed by MongoDB for sharing recipes, workflows, and variable metadata
with the community. The API can be self-hosted (see `vignette("self-hosting")`)
and is used by both the R client functions (`api_*`) and the Shiny exploration
application.

After deploying, the Swagger UI interface at `<your-api-url>/__docs__/`
provides an interactive endpoint explorer automatically generated by plumber.
For detailed request/response schemas and MongoDB collection documentation,
see the sections below.

## Configuration

```{r configure}
library(metasurvey)

# Point to your self-hosted API
configure_api("https://your-api-host.example.com")

# Or use an environment variable
Sys.setenv(METASURVEY_API_URL = "https://your-api-host.example.com")
```

The R client reads the URL first from `configure_api()`, then falls back
to the `METASURVEY_API_URL` environment variable.

## Authentication

The API uses JWT (JSON Web Token) authentication with HMAC-SHA256 signing.
Tokens expire after 24 hours; long-lived tokens (90 days) can be generated
for automated scripts.

### Registration

```{r register}
# Individual account (auto-approved)
api_register("Ana Garcia", "ana@example.com", "password123")

# Institutional member (requires admin review)
api_register(
  "Carlos Rodriguez",
  "carlos@ine.gub.uy",
  "password123",
  user_type = "institutional_member",
  institution = "INE Uruguay"
)
```

Account types:

| Type | Description | Approval |
|------|-------------|----------|
| `individual` | Independent researcher | Automatic |
| `institutional_member` | Member of a recognized institution | Requires admin review |
| `institution` | Institutional account | Requires admin review |

### Login

```{r login}
api_login("ana@example.com", "password123")
```

The token is stored in the session and used automatically in subsequent
API calls. The client automatically renews tokens within 5 minutes of
their expiration.

### Session Management

```{r session}
# View current user profile
api_me()

# Refresh token
api_refresh_token()

# Logout
api_logout()
```

### Long-lived Tokens

For automated scripts and CI/CD, generate a 90-day token from the
Shiny application (Profile tab) or use it directly:

```{r token}
Sys.setenv(METASURVEY_TOKEN = "your-long-lived-token")

# API calls work without interactive login
recipes <- api_list_recipes(survey_type = "ech")
```

## API Endpoints

### Recipes

| Method | Endpoint | Auth | Description |
|--------|----------|------|-------------|
| `GET` | `/recipes` | No | List and search recipes |
| `GET` | `/recipes/:id` | No | Get an individual recipe |
| `POST` | `/recipes` | Yes | Publish a new recipe |
| `POST` | `/recipes/:id/download` | No | Increment download counter |

#### List Recipes

```{r list-recipes}
# All recipes
all <- api_list_recipes()

# Filter by survey type
ech <- api_list_recipes(survey_type = "ech")

# Search by text
labor <- api_list_recipes(search = "empleo")

# Filter by topic
income <- api_list_recipes(topic = "income")

# Filter by certification level
official <- api_list_recipes(certification = "official")

# Pagination
page2 <- api_list_recipes(limit = 10, offset = 10)
```

Query parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `search` | string | Regex search on recipe name |
| `survey_type` | string | `ech`, `eaii`, `eph`, `eai` |
| `topic` | string | `labor_market`, `income`, `education`, `health`, `demographics`, `housing` |
| `certification` | string | `community`, `reviewed`, `official` |
| `user` | string | Filter by author email |
| `limit` | integer | Maximum results (default 50) |
| `offset` | integer | Skip N results (default 0) |

#### Get Recipe

```{r get-recipe}
recipe <- api_get_recipe("ech_employment_001")
```

#### Publish Recipe

```{r publish-recipe}
api_login("ana@example.com", "password123")
api_publish_recipe(my_recipe)
```

The server automatically sets the `user` field from the JWT,
initializes `downloads = 0`, generates an `id` if not provided, and
assigns the `community` certification by default.

### Workflows

| Method | Endpoint | Auth | Description |
|--------|----------|------|-------------|
| `GET` | `/workflows` | No | List and search workflows |
| `GET` | `/workflows/:id` | No | Get an individual workflow |
| `POST` | `/workflows` | Yes | Publish a new workflow |
| `POST` | `/workflows/:id/download` | No | Increment download counter |

```{r workflow-api}
# List workflows for ECH
wf <- api_list_workflows(survey_type = "ech")

# Find workflows that use a specific recipe
wf <- api_list_workflows(recipe_id = "ech_employment_001")

# Get specific workflow
w <- api_get_workflow("wf_labor_market_001")

# Publish
api_publish_workflow(my_workflow)
```

### ANDA Variable Metadata

> **Note:** The ANDA integration is an *unofficial* implementation that
> parses DDI XML metadata from INE Uruguay's public ANDA catalog. It is not
> endorsed by INE and may contain errors or become outdated if INE
> changes the catalog structure. Always verify critical variable definitions
> against the official codebook.

The `/anda/variables` endpoint provides variable metadata obtained from
INE Uruguay's ANDA catalog (DDI XML format). This includes variable labels,
value categories, and type information.

| Method | Endpoint | Auth | Description |
|--------|----------|------|-------------|
| `GET` | `/anda/variables` | No | Get variable metadata |

```{r anda}
# Get all ECH variables
vars <- api_get_anda_variables(survey_type = "ech")

# Get specific variables
vars <- api_get_anda_variables(
  survey_type = "ech",
  var_names = c("pobpcoac", "e27", "ht11")
)
```

Query parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `survey_type` | string | Survey type (default `"ech"`) |
| `names` | string | Comma-separated variable names (all if empty) |

Each variable document contains:

| Field | Description |
|-------|-------------|
| `name` | Variable name (lowercase) |
| `label` | Human-readable label |
| `type` | `discrete`, `continuous`, or `unknown` |
| `value_labels` | List of code-label mappings |
| `description` | Extended description |
| `source_edition` | Survey edition (e.g., `"2024"`) |
| `source_catalog_id` | ANDA catalog ID (e.g., `767`) |

### Administration

| Method | Endpoint | Auth | Description |
|--------|----------|------|-------------|
| `GET` | `/admin/pending-users` | Admin | List institutional accounts pending review |
| `POST` | `/admin/approve/:email` | Admin | Approve an institutional account |
| `POST` | `/admin/reject/:email` | Admin | Reject an institutional account |

Admin access is controlled via the `METASURVEY_ADMIN_EMAIL` environment
variable on the server.

### Health Check

| Method | Endpoint | Auth | Description |
|--------|----------|------|-------------|
| `GET` | `/health` | No | API and MongoDB status |

```json
{
  "status": "ok",
  "service": "metasurvey-api",
  "version": "2.0.0",
  "database": "metasurvey",
  "mongodb": "connected",
  "timestamp": "2026-02-15T12:00:00Z"
}
```

## MongoDB Schema

The database has four collections, each with JSON Schema validation
and optimized indexes.

### Entity-Relationship Diagram

The following diagram shows the MongoDB collections and their relationships:

```text
  ┌──────────────────┐       ┌──────────────────────┐
  │     users         │       │      recipes          │
  ├──────────────────┤       ├──────────────────────┤
  │ email (PK)       │──┐    │ id (PK)              │
  │ name             │  │    │ name                 │
  │ password_hash    │  ├───>│ user (FK)            │
  │ user_type        │  │    │ survey_type          │
  │ institution      │  │    │ edition              │
  └──────────────────┘  │    │ steps[]              │
                        │    │ certification{}      │
                        │    │ categories[]         │
                        │    └──────────┬───────────┘
                        │               │
                        │    ┌──────────┴───────────┐
                        │    │     workflows         │
                        │    ├──────────────────────┤
                        │    │ id (PK)              │
                        └───>│ user (FK)            │
                             │ survey_type          │
                             │ recipe_ids[] (FK)    │
                             │ calls[]              │
                             └──────────────────────┘

  ┌──────────────────────┐
  │   anda_variables      │
  ├──────────────────────┤
  │ survey_type (PK)     │
  │ name (PK)            │
  │ label                │
  │ type                 │
  │ value_labels{}       │
  └──────────────────────┘

  Relationships:
    users    ──1:N──>  recipes     (publishes)
    users    ──1:N──>  workflows   (publishes)
    recipes  ──1:N──>  workflows   (referenced by)
```

### Collections

#### `users`

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | Yes | Display name |
| `email` | string | Yes | Email (unique, validated) |
| `password_hash` | string | Yes | SHA-256 hash (64 characters) |
| `user_type` | enum | Yes | `individual`, `institutional_member`, `institution` |
| `institution` | string | No | Institution name |
| `verified` | boolean | No | Whether identity is verified |
| `review_status` | enum | No | `approved`, `pending`, `rejected` |
| `reviewed_by` | string | No | Reviewing admin's email |
| `reviewed_at` | string | No | ISO timestamp |
| `created_at` | string | Yes | ISO timestamp |

**Indexes:** unique on `email`.

#### `recipes`

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `id` | string | No | Unique identifier (auto-generated) |
| `name` | string | Yes | Recipe name |
| `user` | string | Yes | Author email |
| `survey_type` | enum | Yes | `ech`, `eaii`, `eph`, `eai` |
| `edition` | string/array | No | Survey edition(s) |
| `description` | string | No | Description |
| `topic` | enum | No | `labor_market`, `income`, `education`, `health`, `demographics`, `housing` |
| `version` | string | No | Semantic version (default `"1.0.0"`) |
| `downloads` | number | No | Download counter (default `0`) |
| `steps` | array | No | Step expressions as strings |
| `depends_on` | array | No | Required input variable names |
| `depends_on_recipes` | array | No | IDs of dependent recipes |
| `categories` | array | No | Category objects |
| `certification` | object | No | `{level, certified_at, certified_by, notes}` |
| `user_info` | object | No | `{name, user_type, email, url, verified}` |
| `doc` | object | No | `{input_variables, output_variables, pipeline}` |
| `data_source` | object | No | `{s3_bucket, s3_prefix, file_pattern, provider}` |

**Indexes:** unique on `id`; on `user`, `survey_type`, `topic`,
`downloads` (desc), `certification.level`; compound on
`(survey_type, edition)`; text search on `(name, description, topic)`.

#### `workflows`

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `id` | string | No | Unique identifier (auto-generated) |
| `name` | string | Yes | Workflow name |
| `user` | string | Yes | Author email |
| `survey_type` | enum | Yes | `ech`, `eaii`, `eph`, `eai` |
| `edition` | string/array | No | Survey edition(s) |
| `description` | string | No | Description |
| `version` | string | No | Semantic version |
| `downloads` | number | No | Download counter |
| `estimation_type` | string/array | No | `annual`, `quarterly`, `monthly` |
| `recipe_ids` | array | No | Referenced recipe IDs |
| `calls` | array | No | Estimation calls as strings |
| `call_metadata` | array | No | Call descriptions |
| `categories` | array | No | Category objects |
| `certification` | object | No | Same as recipes |
| `user_info` | object | No | Same as recipes |

**Indexes:** unique on `id`; on `user`, `survey_type`, `recipe_ids`,
`downloads` (desc); compound on `(survey_type, edition)`; text search
on `(name, description)`.

#### `anda_variables`

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `survey_type` | string | Yes | Survey type |
| `name` | string | Yes | Variable name (lowercase) |
| `label` | string | Yes | Human-readable label |
| `type` | enum | No | `discrete`, `continuous`, `unknown` |
| `value_labels` | object | No | Code-label mappings |
| `description` | string | No | Extended description |
| `source_edition` | string | No | Edition (e.g., `"2024"`) |
| `source_catalog_id` | number | No | ANDA catalog ID |

**Indexes:** compound unique on `(survey_type, name)`; on
`survey_type`.

## Database Setup

To set up the database on a new deployment:

```bash
# 1. Create collections with JSON Schema validation and indexes
mongosh "$METASURVEY_MONGO_URI" inst/scripts/setup_mongodb.js

# 2. Seed recipes, workflows, and users
METASURVEY_MONGO_URI="..." Rscript inst/scripts/seed_ech_recipes.R

# 3. Seed ANDA variable metadata from INE catalog
METASURVEY_MONGO_URI="..." Rscript inst/scripts/seed_anda_metadata.R
```

The setup script creates the four collections and builds the indexes.
It is idempotent: existing collections are skipped.

## Server Deployment

### Environment Variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `METASURVEY_MONGO_URI` | Yes | --- | MongoDB connection string |
| `METASURVEY_DB` | No | `metasurvey` | Database name |
| `METASURVEY_JWT_SECRET` | No | `metasurvey-dev-secret-...` | JWT signing secret (override in production) |
| `METASURVEY_ADMIN_EMAIL` | No | --- | Admin email for institutional review |

### Running Locally

```bash
METASURVEY_MONGO_URI="mongodb+srv://user:pass@cluster.mongodb.net" \
  Rscript -e 'plumber::plumb("inst/api/plumber.R")$run(port = 8787)'
```

The Swagger UI interface will be available at `http://localhost:8787/__docs__/`.

### Docker

```bash
docker build -t metasurvey-api inst/api/
docker run -p 8787:8787 \
  -e METASURVEY_MONGO_URI="mongodb+srv://..." \
  -e METASURVEY_JWT_SECRET="your-production-secret" \
  -e METASURVEY_ADMIN_EMAIL="admin@example.com" \
  metasurvey-api
```

### Railway

The API is configured for Railway deployment via the `render.yaml` file
in `inst/api/`. Push the repository and configure the environment
variables in the Railway dashboard.

## CORS

The API allows cross-origin requests from any origin:

- **Allowed methods:** GET, POST, OPTIONS
- **Allowed headers:** Content-Type, Authorization

## Next Steps

- **[Interactive recipe explorer](https://metasurveyr.github.io/metasurvey/articles/shiny-explorer.html)** -- Browse
  recipes and workflows through the Shiny web application
- **[Creating and publishing recipes](recipes.html)** -- Build recipes
  programmatically and publish them to the API
- **[Estimation workflows](workflows-and-estimation.html)** -- Compute
  weighted survey estimates with `workflow()`
