PatientGenerator facilitates the creation of synthetic
test datasets for the OMOP Common Data Model (CDM) using two
complementary approaches:
patientChat: Generates structured
patient JSON files using Large Language Models (LLMs).patientDesigner: Provides a D3-based
Shiny interface for reviewing and editing CDM test sets.The package also includes support for Hecate-powered concept lookups to ensure valid OMOP concept codes.
# install.packages("remotes")
remotes::install_github("mi-erasmusmc/PatientGenerator")patientChat.patientDesigner().
hecateSearch) during table editing.patientChatSet an OPENAI_API_KEY environment variable (e.g., via
usethis::edit_r_environ()) to enable LLM access.
Available models can be listed using
PatientGenerator::availableModels().
library(PatientGenerator)
patientGenerator <- patientChat$new(
model = "gpt-5.4",
echo = "none"
)Provide detailed prompts, including specific concept sets, for optimal results.
patientGenerator$prompt(
"Population (person table):
- 10 adult patients
- 5 female
- 5 male
Observation Period:
- Start date between date of birth and 2025-12-31
Condition Occurrence:
- All patients must have Diabetes (condition_concept_id: 201826)
- Start date between 2015-01-01 and 2020-12-31
Drug Exposure:
- All patients must have Semaglutide (drug_concept_id: 19079450)
- Exposure within 30 days post-index date
Measurement:
- All patients must have Fasting glucose (measurement_concept_id: 3018251)
Procedure Occurrence:
- 50% of patients must have Amputation of toe (procedure_concept_id: 4159766)
Output Requirements:
- Populate only the tables specified in this prompt"
)testthatSave the generated dataset as a JSON file and utilize
TestGenerator::patientsCDM to instantiate a CDM
reference.
patientGenerator$save(name = "diabetes-patients")
cdm <- TestGenerator::patientsCDM(
testName = "diabetes-patients",
cdmVersion = "5.4"
)
cdm$person |>
collect() |>
print()#> cdm$person |> collect() |> head(5)
#> person_id gender_concept_id year_of_birth person_source_value
#> <int> <int> <int> <char>
#> 1: 1 8532 1965 SYN001
#> 2: 2 8532 1972 SYN002
#> 3: 3 8532 1958 SYN003
#> 4: 4 8532 1981 SYN004
#> 5: 5 8532 1949 SYN005
The LLM can be instructed to modify the current test set within the
same patientChat instance.
patientGenerator$prompt("Remove all male patients")#> cdm$person |> collect() |> head(5)
#> person_id gender_concept_id year_of_birth person_source_value
#> <int> <int> <int> <char>
#> 1: 1 8532 1965 SYN001
#> 2: 2 8532 1972 SYN002
#> 3: 3 8532 1958 SYN003
#> 4: 4 8532 1981 SYN004
#> 5: 5 8532 1949 SYN005
patientDesigner()Launch the interactive editor to review and refine datasets:
PatientGenerator::patientDesigner()The interface supports:
patientDesigner integrates a concept search module
powered by hecateSearch(). This allows users to search for
and insert valid OMOP concept IDs directly into the CDM tables.
Configure Hecate globally via environment variables:
Sys.setenv(
HECATE_BASE_URL = "https://your-hecate-server/api",
HECATE_API_KEY = "your-api-key"
)Or via package options:
options(PatientGenerator.hecate = list(
base_url = "https://your-hecate-server/api",
timeout_ms = 15000,
api_key = "your-api-key"
))vignette("shiny-integration", package = "PatientGenerator")