Maintenance CLI¶
The omop-alchemy maintenance CLI handles everything you need to operate an OMOP CDM
database: creating tables, loading Athena vocabularies, managing indexes and foreign key
enforcement, running health checks, and taking backups. It talks directly to a SQLAlchemy
engine, so all connection details are controlled by the same engine URL configuration you
use for the ORM.
Alpha status Treat this CLI as alpha operational tooling. Interfaces and behavior may still change.
Connection setup¶
Database connection and CDM schema come from oa_configurator — no per-command flags needed. Run omop-config configure omop_alchemy once to create ~/.config/omop/config.toml, then every omop-alchemy command picks it up automatically.
Backend support¶
Some commands depend on PostgreSQL-specific features and will return a clear error if you run them against SQLite.
| Command group | Requires PostgreSQL | Why |
|---|---|---|
load-vocab-source |
No (PostgreSQL + SQLite) | Uses ORM CSV loader; --bulk-mode and --db-schema are PostgreSQL-only |
indexes |
No (cluster apply is PostgreSQL-only) | Index DDL is standard SQL; CLUSTER is PostgreSQL |
create-missing-tables, reconcile-schema, data-summary, info, doctor |
No | Pure SQLAlchemy metadata operations |
reset-sequences |
Yes | PostgreSQL sequences (SETVAL) |
truncate-tables |
Yes | PostgreSQL TRUNCATE with RESTART IDENTITY and CASCADE |
foreign-keys |
Yes | PostgreSQL internal RI trigger ALTER TABLE ... DISABLE/ENABLE TRIGGER ALL |
analyze-tables |
No (--vacuum is PostgreSQL-only) |
ANALYZE is standard; VACUUM ANALYZE is PostgreSQL |
fulltext |
Yes | PostgreSQL tsvector, tsquery, and GIN indexes |
backup-database, restore-database |
Yes | pg_dump / pg_restore / psql |
Workflow guides¶
Fresh database setup¶
Use this when you are starting with an empty database and want to get an OMOP schema populated from scratch.
# 1. Create any OMOP tables that don't exist yet (safe to run on an existing DB)
omop-alchemy create-missing-tables --dry-run # preview first
omop-alchemy create-missing-tables
# 2. Load Athena vocabulary files
omop-alchemy load-vocab-source --athena-source ./athena_files
# 3. Reset sequences so new clinical inserts start above vocabulary IDs (PostgreSQL)
omop-alchemy reset-sequences
The create-missing-tables command compares ORM metadata against the live schema and
creates only what is missing. It is idempotent — running it again on a populated database
does nothing.
load-vocab-source automatically creates any missing vocabulary tables before loading,
so you can run it immediately after step 1 or even skip step 1 for vocabulary-only setups.
Full vocabulary reload¶
Run this when you download a new Athena export and want to replace the existing vocabulary.
# Suspend FK enforcement and drop indexes so loading is fast
omop-alchemy foreign-keys disable
omop-alchemy indexes disable --vocab
# Clear existing vocabulary data
omop-alchemy truncate-tables --scope vocabulary --restart-identities --yes
# Load new vocabulary (bulk-mode is on by default: does not re-toggle indexes per table)
omop-alchemy load-vocab-source --athena-source ./athena_files --merge-strategy replace
# Rebuild indexes and re-enable FK enforcement
omop-alchemy indexes enable --vocab
omop-alchemy foreign-keys enable --strict
# Refresh full-text sidecar vectors (if installed)
omop-alchemy fulltext populate
About --bulk-mode (default on PostgreSQL):
load-vocab-source disables FK triggers and drops vocabulary indexes once before the
load loop, then rebuilds them once at the end. This is much faster than the alternative
of toggling per table — for a full Athena export the difference can be 10–20×. SQLite
ignores this flag. Pass --no-bulk-mode if you need per-table rollback safety.
About --merge-strategy replace:
replace truncates each target table and reloads from the CSV. Use upsert for
incremental vocabulary patches where you do not want to lose custom extensions.
Use insert_if_empty as the fastest path when the target tables are guaranteed empty.
About --strict on foreign-keys enable:
--strict validates all FK relationships before re-enabling RI triggers. If violations
are found, no triggers are re-enabled and you get a report of the problematic rows.
Omit --strict to re-enable unconditionally.
ETL bulk load cycle¶
Use this before and after a large clinical data load to avoid the overhead of FK and index maintenance during insertion.
# Before your ETL runs: suspend enforcement and remove indexes
omop-alchemy foreign-keys disable
omop-alchemy indexes disable
# --- your ETL process runs here ---
# After ETL: restore state
omop-alchemy reset-sequences
omop-alchemy indexes enable
omop-alchemy foreign-keys enable --strict
omop-alchemy analyze-tables --scope clinical
analyze-tables refreshes planner statistics after a large load so query plans don't
degrade. --scope clinical targets only clinical tables; omit --scope to analyze
everything.
reset-sequences ensures that any auto-increment columns are positioned above the
maximum key value present in the table. This matters when your ETL inserts explicit IDs
(common in OMOP) — without a reset, the next ORM insert would try to reuse an ID that
already exists.
Health checks¶
Quick read-only check:
omop-alchemy doctor
Runs a fast, non-destructive pass over connection readiness, schema drift, and FK trigger status. The output tells you what is wrong and what to do about it.
Deep FK validation (PostgreSQL):
omop-alchemy doctor --deep
Adds a full FK constraint scan — it actually queries the data to find rows that violate declared FK relationships. On large databases this can be slow; use it when you suspect data integrity issues after an ETL or vocabulary patch.
Full environment introspection:
omop-alchemy info
Shows the active engine URL, installed backend driver, OMOP Alchemy version, optional dependency state (orm-loader, psycopg2/psycopg, etc.), and which maintenance commands are available given the current backend. Run this first when diagnosing "why doesn't this command work".
When doctor reports a problem:
| Doctor output | What it means | Fix |
|---|---|---|
| Schema drift: missing tables | ORM has tables not in DB | create-missing-tables |
| Schema drift: extra tables | DB has tables ORM doesn't know | Review manually; may be custom extensions |
| FK triggers disabled | RI enforcement was suspended | foreign-keys enable or foreign-keys enable --strict |
| FK violations found | Data fails FK constraints | Investigate data, then foreign-keys enable --strict |
Schema drift¶
The reconcile-schema command compares your ORM metadata against the live database and
reports what it finds:
omop-alchemy reconcile-schema
omop-alchemy reconcile-schema --dry-run # same output, no changes
Output categories:
- missing — table is in ORM metadata but not in the database. Fix:
create-missing-tables. - extra — table is in the database but not in ORM metadata. This can mean custom tables, staging tables, or leftover artifacts. The CLI does not touch these.
- matched — table exists in both and metadata is consistent.
- drifted — table exists in both but column definitions differ (types, nullability, defaults). The CLI does not auto-migrate; you need to handle schema migrations manually.
For safe deployment: run reconcile-schema first, then create-missing-tables --dry-run,
then create-missing-tables.
Full-text search sidecars¶
Full-text search support adds tsvector sidecar columns (and GIN indexes) to the
concept and concept_synonym tables, enabling fast text search over vocabulary.
# Install the sidecar columns and indexes (once, after vocabulary tables exist)
omop-alchemy fulltext install
# Populate sidecar vectors from current vocabulary data
omop-alchemy fulltext populate
You must rerun fulltext populate after every vocabulary reload. Sidecar vectors
do not auto-refresh when the underlying concept data changes.
To remove the sidecars:
omop-alchemy fulltext drop
The --regconfig option controls the PostgreSQL text search configuration
(default english). For multilingual vocabularies, use a suitable config such as
simple.
For query-side usage and optional ORM metadata registration, see PostgreSQL Full-Text Search.
Backup and restore¶
These commands wrap pg_dump and pg_restore / psql. PostgreSQL client tools must
be installed and on PATH.
# Create a backup (custom format is recommended — smaller and restorable in parallel)
omop-alchemy backup-database \
--output-path ./cdm-backup.dump \
--format custom
# Restore into a target database (the DB must already exist and be empty)
omop-alchemy restore-database ./cdm-backup.dump \
--format custom
Format comparison:
| Format | File extension | Restore tool | Advantages |
|---|---|---|---|
custom (default) |
.dump |
pg_restore |
Compressed; supports parallel restore (-j) and selective restore |
plain |
.sql |
psql |
Human-readable SQL; editable but much larger |
Restore caveats:
- The target database must already exist. The CLI does not create or drop databases.
- For plain format, the schema is embedded in the SQL dump; no selective schema restore is possible.
- For custom format, pg_restore can be invoked manually with -n <schema> for selective schema restore.
Use --dry-run on backup-database to see the pg_dump command that would be run
without executing it.
Recovery: when things go wrong¶
Bulk load or vocabulary reload fails mid-way¶
If load-vocab-source (with --bulk-mode) or your ETL process fails after FK triggers
and indexes have been disabled, they stay disabled. The database continues to accept
writes but does not enforce FK constraints, and queries may use slow sequential scans.
To recover:
omop-alchemy indexes enable --vocab # or without --vocab if you disabled all indexes
omop-alchemy foreign-keys enable
If you used --strict originally and now have data violations:
omop-alchemy foreign-keys validate # see what's broken
# fix the data
omop-alchemy foreign-keys enable --strict
FK validation fails after enable --strict¶
omop-alchemy foreign-keys validate
This reports exactly which tables have violations, which constraints are affected, and
how many rows fail. Fix the data, then retry foreign-keys enable --strict.
If you need to re-enable FK triggers despite the violations (for example, to allow the
application to run while you investigate), use foreign-keys enable without --strict.
Sequences are out of sync after a bulk insert¶
After any load that inserts explicit primary key values:
omop-alchemy reset-sequences # all managed tables
omop-alchemy reset-sequences --vocab # vocabulary tables only
reset-sequences sets each owned sequence to MAX(pk) + 1. It reports every table
it touches and the old/new sequence positions.
Command reference¶
| Command | Purpose | Key options | Backend |
|---|---|---|---|
info |
Inspect CLI readiness, backend, and dependency state | --vocab |
All |
doctor |
Read-only health check: connection, schema, FK state | --deep, --vocab |
All (--deep PostgreSQL-focused) |
data-summary |
Show managed tables and row counts | --vocab, --include-missing |
All |
reconcile-schema |
Compare ORM metadata vs live schema | --vocab, --dry-run |
All |
create-missing-tables |
Create OMOP tables absent from DB | --dry-run, --no-vocab |
All |
load-vocab-source |
Load Athena vocabulary CSVs | --athena-source, --merge-strategy, --bulk-mode/--no-bulk-mode, --dry-run |
PostgreSQL, SQLite |
truncate-tables |
Truncate selected tables | --scope, --table, --yes, --cascade, --restart-identities |
PostgreSQL |
reset-sequences |
Reset owned PK sequences to MAX(pk) + 1 |
--dry-run, --vocab |
PostgreSQL |
foreign-keys disable |
Suspend FK RI trigger enforcement | --vocab, --dry-run |
PostgreSQL |
foreign-keys enable |
Re-enable FK RI trigger enforcement | --strict, --vocab, --dry-run |
PostgreSQL |
foreign-keys validate |
Report FK constraint violations | --vocab |
PostgreSQL |
foreign-keys status |
Show current trigger enable/disable state | --vocab |
PostgreSQL |
analyze-tables |
Refresh planner statistics | --scope, --table, --vacuum |
PostgreSQL, SQLite (--vacuum PostgreSQL-only) |
indexes disable |
Drop ORM-defined secondary indexes | --vocab, --dry-run |
All |
indexes enable |
Recreate ORM-defined secondary indexes | --vocab, --dry-run |
All (cluster on PostgreSQL) |
fulltext install |
Add tsvector sidecar columns to vocabulary tables | --regconfig, --no-create-indexes |
PostgreSQL |
fulltext populate |
Populate sidecar tsvector vectors | --regconfig |
PostgreSQL |
fulltext drop |
Remove tsvector sidecar columns and indexes | PostgreSQL | |
backup-database |
Create a pg_dump backup artifact |
--output-path, --format, --dry-run |
PostgreSQL |
restore-database |
Restore a backup artifact into the target DB | --format (required), --dry-run |
PostgreSQL |