| Type: | Package |
| Title: | Generator of Synthetic Patient Data for the OMOP Common Data Model |
| Version: | 0.1.4 |
| Description: | Tools to generate synthetic patient-level test datasets in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Includes a chat-driven generator backed by large language models and an interactive 'shiny' designer for editing CDM test sets. |
| URL: | https://github.com/mi-erasmusmc/PatientGenerator, https://mi-erasmusmc.github.io/PatientGenerator/ |
| BugReports: | https://github.com/mi-erasmusmc/PatientGenerator/issues |
| License: | Apache License (≥ 2) |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | bslib, checkmate, data.table, DBI, dplyr, DT, duckdb, ellmer, glue, httr2, jsonlite, r2d3, R6, shiny, stringr, testthat |
| Suggests: | CDMConnector, CohortCharacteristics, CohortConstructor, covr, ggplot2, knitr, rmarkdown, TestGenerator |
| VignetteBuilder: | knitr |
| Config/Needs/website: | pkgdown |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-04-27 09:48:57 UTC; cbarboza |
| Author: | Cesar Barboza |
| Maintainer: | Cesar Barboza <c.barboza@mi-erasmusmc.nl> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-04 19:10:02 UTC |
availableModels() If the API key is valid in the system, returns available models to the user from the LLM provider.
Description
OpenAI is the only one currently supported.
Usage
availableModels()
Value
A string list with the id of a vailable models.
Concept search Shiny module
Description
Reusable UI and server for searching the vocabulary (Hecate) and selecting a concept. Shows a button that opens a modal with text input, search button, and results in a DT table. Optional callback when a concept is selected.
Usage
conceptSearchUI(id, buttonLabel = "Concept search")
conceptSearchServer(id, onConceptSelected = NULL, placeholderText = "")
Arguments
id |
Module namespace id. |
buttonLabel |
Label for the trigger button (default |
onConceptSelected |
Optional function of one argument |
placeholderText |
Character string used as placeholder text in the search input field. |
Functions
-
conceptSearchUI(): UI for the concept search: a single button that opens the search modal. -
conceptSearchServer(): Server for the concept search modal (text input, search, DT, close).
Useful to list available datasets in the testCases folder
Description
Useful to list available datasets in the testCases folder
Usage
getTestSets(path = NULL)
Arguments
path |
Optional directory containing JSON test sets. If NULL, the package resolves a default path with testthat integration. |
Value
A list()
Create a Hecate API client
Description
Create a Hecate API client
Usage
hecateClient(baseUrl = NULL, timeoutMs = NULL, apiKey = NULL)
Arguments
baseUrl |
Base URL of the Hecate API (default from config). |
timeoutMs |
Timeout in milliseconds (default from config). |
apiKey |
Optional API key for authorization (default from config). |
Value
A client object with class hecate_client.
Search Hecate concepts and return results as a data frame
Description
Search Hecate concepts and return results as a data frame
Usage
hecateSearch(
query,
vocabularyId = NULL,
standardConcept = NULL,
domainId = NULL,
conceptClassId = NULL,
limit = 20,
client = hecateClient()
)
Arguments
query |
Character(1); search query (required). |
vocabularyId |
Character(1) or NULL; optional vocabulary filter (comma-separated). |
standardConcept |
Character(1) or NULL; e.g. |
domainId |
Character(1) or NULL; optional domain filter (comma-separated). |
conceptClassId |
Character(1) or NULL; optional concept class filter. |
limit |
Integer(1); max results (default 20, max 150). |
client |
Hecate client (default |
Value
Data frame of search results, or NULL if an error occurred (API error or bad response shape).
Examples
## Not run:
# Simple search
df <- hecateSearch("diabetes")
# Search with filters
df <- hecateSearch("hypertension", domainId = "Condition", limit = 10)
## End(Not run)
Null coalescing operator
Description
Null coalescing operator
Usage
x %||% y
Arguments
x |
First value (any type). |
y |
Fallback value when |
Value
x if not NULL, otherwise y.
patientChat() generates synthetic patients in the OMOP-CDM using an LLM API.
Description
Requires an OPEN_AI_KEY in ~/.Renviron. After that just sent a prompt and save() the results. The JSON file can be used as an OMOP-CDM patient test set.
Details
Accepts a prompt as input. Produces a test set using a structured JSON schema. Utilizes tools such as CodelistGenerator or Hecate to look up concept IDs. Accepts subsequent prompts to modify existing test sets that the LLM uses as context.
This class allows testing patient sets created by the LLM, prompt engineering, integration of search tools and functionality, and creating a set of patients to test analytical packages.
Value
A JSON response that includes: the natural language answer from the LLM and a JSON with test set patients in accordance to the provided schema.
Public fields
chatAn ellmer chat instance
json_schema_pathJSON schema to output structured results
responseOuput from the LLM
codelistA codelist with details to search for concepts ids
Methods
Public methods
Method new()
Create a new chat to create JSON test sets for OMOP-CDM.
Usage
patientChat$new(
system_prompt = NULL,
model = "gpt-5.4",
jsonSchemaPath = NULL,
echo = c("none", "output", "all"),
codelist_data = NULL
)Arguments
system_promptInitial system prompt to impose behaviour to the LLM
modelSuch as "gpt-5.3". For a complete list, call patientChat$availableModels()
jsonSchemaPathThe JSON schema to structure output from LLM
echoHow the output will be displayed in the console
codelist_dataA codelist with details to search for concepts ids
Returns
A new Person object.
Method prompt()
Prompt to request data from LLM API
Usage
patientChat$prompt(prompt)
Arguments
promptA query in character.
Method json_response()
Output in JSON format
Usage
patientChat$json_response()
Method output()
Returns the chat object
Usage
patientChat$output()
Method retrieveCodelist()
Retrieves and filters data from codelist_data
Usage
patientChat$retrieveCodelist(concept_label = "Stage 1", domain = "Measurement")
Arguments
concept_labelFilters the concept_name in the codelist with details
domainFilters the domain in the codelist with details.
Method save()
Saves the JSON test set to disk.
Usage
patientChat$save(name = "patient-chat-test", path = NULL)
Arguments
nameName of the file
pathTo save the file. If NULL, the package first tries
testthat::test_path("testCases"), then checksoptions(PatientGenerator.testSetDir = "..."), and finally falls back to the package user data directory.
Method availableModels()
Retrieves available models from the LLM API.
Usage
patientChat$availableModels()
Method clone()
The objects of this class are cloneable with this method.
Usage
patientChat$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
## Not run:
generator <- patientChat$new()
generator$prompt("Give me 5 patients")
generator$save("my_test")
## End(Not run)
patientChatNaive() is a grapper for the ellmer package to send prompts and send the output test set to an LLM. Requires a valid API key.
Description
Priorities: - Accepts a prompt as an input. - Produces a test set in accordance to the provided JSON schema. - Utilizes tools such as CodelistGenerator or Hecate to look up for functions. - Accepts a subsequent prompt with a test set that the LLM has to use as a context
One function for this tasks allow us to: - Test the test sets created by the LLM. - Test prompt engineering. - Test integration of tools functionality. - Allow us to create fast a small set of patients to test analytical packages.
Usage
patientChatNaive(
prompt = "### Give me a sample of five patients",
model = "gpt-5.2",
jsonSchemaPath = NULL
)
Arguments
prompt |
A prompt to the LLM, in character or JSON response. |
model |
The model used by the LLM. Currently only OpenAI models are accepted. |
jsonSchemaPath |
Path to a JSON schema used to structure the response. |
Value
A JSON response that includes: the natural language answer from the LLM and a JSON with test set patients in accordance to the provided schema.
patientDesigner() is a visual interface based on D3 to construct test datasets for the OMOP-CDM
Description
patientDesigner() is a visual interface based on D3 to construct test datasets for the OMOP-CDM
Usage
patientDesigner(path = NULL)
Arguments
path |
Optional folder containing JSON test sets. If NULL, default path resolution keeps testthat integration. |
Value
A Shiny app