Database Management CLI
The OMOP CDM instantiation tool provides a streamlined way to bootstrap a local OHDSI Common Data Model (CDM) database using Athena vocabulary files and synthetic test data.
omop-cdm
Bootstrap the OMOP CDM and load reference data from Athena into a local database.
If you want PostgreSQL full-text sidecars for concept and concept_synonym, pass
--fulltext. The command will install and populate the sidecars after the vocabulary
load finishes.
Warning
This command will wipe the existing database in the target container before loading new data.
Prerequisites
Before running the command, ensure your environment is configured with a .env file or exported variables:
OMOP_DATABASE_URL: SQLAlchemy connection string (e.g.,postgresql://user:pass@localhost:5432/omop).SOURCE_PATH: Local directory path containing the Athena CSV files (e.g.,CONCEPT.csv,VOCABULARY.csv).
Usage
If installed as a package:
omop-graph omop-cdm [--add-test-data] [--fulltext] --chunk-size=<chunk_size>
Example Usage:
# Instantiate with test data and a custom chunk size of 10,000
omop-graph omop-cdm --add-test-data --chunk-size=10000
# Display the help
omop-graph omop-cdm --help
Command Arguments
| Argument | Type | Default | Description |
|---|---|---|---|
--add-test-data |
Boolean |
False | Whether to add synthetic test data after loading Athena data. |
--chunk-size, -c |
Integer |
5000 |
Number of rows to process in each chunk. Adjust based on your system's memory capacity to avoid OOM errors. |
--fulltext |
Boolean |
False | Install and populate PostgreSQL full-text sidecars for concept and concept_synonym after the vocabulary load. |
--fulltext-regconfig |
String |
english |
PostgreSQL text search configuration used when populating the full-text sidecars. |
relationship-classification
This command ingests pre-defined relationship classifications and mappings into the database. It categorizes standard OMOP relationships into semantic groups (e.g., Hierarchical, Lateral, Mapping) to enable more intelligent graph reasoning.
Rationale
The standard OMOP relationship table provides basic metadata, but lacks unified semantic "kinds" out of the box. This tool maps those relationships to a specific ClassIDEnum (like EQUIVALENT, HIERARCHICAL, or IDENTITY) and provides detailed inference descriptions used by the KnowledgeGraph facade.
Prerequisites
The command expects two CSV files to be present in the target directory:
Prerequisites
Before running the command, ensure your environment is configured with a .env file or exported variables:
1. Prepopulated OMOP CDM (e.g. using command omop-cdm)
2. predicate_classification.csv: Defines the semantic classes and subclasses (descriptions, semantics, and inference rules).
3. predicate_mapping.csv: Maps specific OMOP relationship_ids to the classes defined in the classification file.
4. Set following environment variables:
- OMOP_DATABASE_URL: SQLAlchemy connection string (e.g., postgresql://user:pass@localhost:5432/omop).
- SOURCE_PATH: Local directory path containing the Athena CSV files (e.g., CONCEPT.csv, VOCABULARY.csv). This is required as the new connections/tables are stored there after creation.
Usage
omop-graph relationship-classification --pred-class-dir <PATH_TO_CSV_DIR>
Command Options
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--pred-class-dir |
String |
./docs |
Path to the directory containing the classification CSVs. | |
--verbose |
-v |
Count |
0 |
Increase logging verbosity (use -v or -vv). |
| --- |