Skip to content

Maintenance CLI

The omop-alchemy maintenance CLI handles everything you need to operate an OMOP CDM database: creating tables, loading Athena vocabularies, managing indexes and foreign key enforcement, running health checks, and taking backups. It talks directly to a SQLAlchemy engine, so all connection details are controlled by the same engine URL configuration you use for the ORM.

Alpha status Treat this CLI as alpha operational tooling. Interfaces and behavior may still change.


Connection setup

Database connection and CDM schema come from oa_configurator — no per-command flags needed. Run omop-config configure omop_alchemy once to create ~/.config/omop/config.toml, then every omop-alchemy command picks it up automatically.


Backend support

Some commands depend on PostgreSQL-specific features and will return a clear error if you run them against SQLite.

Command group Requires PostgreSQL Why
load-vocab-source No (PostgreSQL + SQLite) Uses ORM CSV loader; --bulk-mode and --db-schema are PostgreSQL-only
indexes No (cluster apply is PostgreSQL-only) Index DDL is standard SQL; CLUSTER is PostgreSQL
create-missing-tables, reconcile-schema, data-summary, info, doctor No Pure SQLAlchemy metadata operations
reset-sequences Yes PostgreSQL sequences (SETVAL)
truncate-tables Yes PostgreSQL TRUNCATE with RESTART IDENTITY and CASCADE
foreign-keys Yes PostgreSQL internal RI trigger ALTER TABLE ... DISABLE/ENABLE TRIGGER ALL
analyze-tables No (--vacuum is PostgreSQL-only) ANALYZE is standard; VACUUM ANALYZE is PostgreSQL
fulltext Yes PostgreSQL tsvector, tsquery, and GIN indexes
backup-database, restore-database Yes pg_dump / pg_restore / psql

Workflow guides

Fresh database setup

Use this when you are starting with an empty database and want to get an OMOP schema populated from scratch.

# 1. Create any OMOP tables that don't exist yet (safe to run on an existing DB)
omop-alchemy create-missing-tables --dry-run  # preview first
omop-alchemy create-missing-tables

# 2. Load Athena vocabulary files
omop-alchemy load-vocab-source --athena-source ./athena_files

# 3. Reset sequences so new clinical inserts start above vocabulary IDs (PostgreSQL)
omop-alchemy reset-sequences

The create-missing-tables command compares ORM metadata against the live schema and creates only what is missing. It is idempotent — running it again on a populated database does nothing.

load-vocab-source automatically creates any missing vocabulary tables before loading, so you can run it immediately after step 1 or even skip step 1 for vocabulary-only setups.


Full vocabulary reload

Run this when you download a new Athena export and want to replace the existing vocabulary.

# Suspend FK enforcement and drop indexes so loading is fast
omop-alchemy foreign-keys disable
omop-alchemy indexes disable --vocab

# Clear existing vocabulary data
omop-alchemy truncate-tables --scope vocabulary --restart-identities --yes

# Load new vocabulary (bulk-mode is on by default: does not re-toggle indexes per table)
omop-alchemy load-vocab-source --athena-source ./athena_files --merge-strategy replace

# Rebuild indexes and re-enable FK enforcement
omop-alchemy indexes enable --vocab
omop-alchemy foreign-keys enable --strict

# Refresh full-text sidecar vectors (if installed)
omop-alchemy fulltext populate

About --bulk-mode (default on PostgreSQL): load-vocab-source disables FK triggers and drops vocabulary indexes once before the load loop, then rebuilds them once at the end. This is much faster than the alternative of toggling per table — for a full Athena export the difference can be 10–20×. SQLite ignores this flag. Pass --no-bulk-mode if you need per-table rollback safety.

About --merge-strategy replace: replace truncates each target table and reloads from the CSV. Use upsert for incremental vocabulary patches where you do not want to lose custom extensions. Use insert_if_empty as the fastest path when the target tables are guaranteed empty.

About --strict on foreign-keys enable: --strict validates all FK relationships before re-enabling RI triggers. If violations are found, no triggers are re-enabled and you get a report of the problematic rows. Omit --strict to re-enable unconditionally.


ETL bulk load cycle

Use this before and after a large clinical data load to avoid the overhead of FK and index maintenance during insertion.

# Before your ETL runs: suspend enforcement and remove indexes
omop-alchemy foreign-keys disable
omop-alchemy indexes disable

# --- your ETL process runs here ---

# After ETL: restore state
omop-alchemy reset-sequences
omop-alchemy indexes enable
omop-alchemy foreign-keys enable --strict
omop-alchemy analyze-tables --scope clinical

analyze-tables refreshes planner statistics after a large load so query plans don't degrade. --scope clinical targets only clinical tables; omit --scope to analyze everything.

reset-sequences ensures that any auto-increment columns are positioned above the maximum key value present in the table. This matters when your ETL inserts explicit IDs (common in OMOP) — without a reset, the next ORM insert would try to reuse an ID that already exists.


Health checks

Quick read-only check:

omop-alchemy doctor

Runs a fast, non-destructive pass over connection readiness, schema drift, and FK trigger status. The output tells you what is wrong and what to do about it.

Deep FK validation (PostgreSQL):

omop-alchemy doctor --deep

Adds a full FK constraint scan — it actually queries the data to find rows that violate declared FK relationships. On large databases this can be slow; use it when you suspect data integrity issues after an ETL or vocabulary patch.

Full environment introspection:

omop-alchemy info

Shows the active engine URL, installed backend driver, OMOP Alchemy version, optional dependency state (orm-loader, psycopg2/psycopg, etc.), and which maintenance commands are available given the current backend. Run this first when diagnosing "why doesn't this command work".

When doctor reports a problem:

Doctor output What it means Fix
Schema drift: missing tables ORM has tables not in DB create-missing-tables
Schema drift: extra tables DB has tables ORM doesn't know Review manually; may be custom extensions
FK triggers disabled RI enforcement was suspended foreign-keys enable or foreign-keys enable --strict
FK violations found Data fails FK constraints Investigate data, then foreign-keys enable --strict

Schema drift

The reconcile-schema command compares your ORM metadata against the live database and reports what it finds:

omop-alchemy reconcile-schema
omop-alchemy reconcile-schema --dry-run  # same output, no changes

Output categories:

  • missing — table is in ORM metadata but not in the database. Fix: create-missing-tables.
  • extra — table is in the database but not in ORM metadata. This can mean custom tables, staging tables, or leftover artifacts. The CLI does not touch these.
  • matched — table exists in both and metadata is consistent.
  • drifted — table exists in both but column definitions differ (types, nullability, defaults). The CLI does not auto-migrate; you need to handle schema migrations manually.

For safe deployment: run reconcile-schema first, then create-missing-tables --dry-run, then create-missing-tables.


Full-text search sidecars

Full-text search support adds tsvector sidecar columns (and GIN indexes) to the concept and concept_synonym tables, enabling fast text search over vocabulary.

# Install the sidecar columns and indexes (once, after vocabulary tables exist)
omop-alchemy fulltext install

# Populate sidecar vectors from current vocabulary data
omop-alchemy fulltext populate

You must rerun fulltext populate after every vocabulary reload. Sidecar vectors do not auto-refresh when the underlying concept data changes.

To remove the sidecars:

omop-alchemy fulltext drop

The --regconfig option controls the PostgreSQL text search configuration (default english). For multilingual vocabularies, use a suitable config such as simple.

For query-side usage and optional ORM metadata registration, see PostgreSQL Full-Text Search.


Backup and restore

These commands wrap pg_dump and pg_restore / psql. PostgreSQL client tools must be installed and on PATH.

# Create a backup (custom format is recommended — smaller and restorable in parallel)
omop-alchemy backup-database \
  --output-path ./cdm-backup.dump \
  --format custom

# Restore into a target database (the DB must already exist and be empty)
omop-alchemy restore-database ./cdm-backup.dump \
  --format custom

Format comparison:

Format File extension Restore tool Advantages
custom (default) .dump pg_restore Compressed; supports parallel restore (-j) and selective restore
plain .sql psql Human-readable SQL; editable but much larger

Restore caveats: - The target database must already exist. The CLI does not create or drop databases. - For plain format, the schema is embedded in the SQL dump; no selective schema restore is possible. - For custom format, pg_restore can be invoked manually with -n <schema> for selective schema restore.

Use --dry-run on backup-database to see the pg_dump command that would be run without executing it.


Recovery: when things go wrong

Bulk load or vocabulary reload fails mid-way

If load-vocab-source (with --bulk-mode) or your ETL process fails after FK triggers and indexes have been disabled, they stay disabled. The database continues to accept writes but does not enforce FK constraints, and queries may use slow sequential scans.

To recover:

omop-alchemy indexes enable --vocab   # or without --vocab if you disabled all indexes
omop-alchemy foreign-keys enable

If you used --strict originally and now have data violations:

omop-alchemy foreign-keys validate   # see what's broken
# fix the data
omop-alchemy foreign-keys enable --strict

FK validation fails after enable --strict

omop-alchemy foreign-keys validate

This reports exactly which tables have violations, which constraints are affected, and how many rows fail. Fix the data, then retry foreign-keys enable --strict.

If you need to re-enable FK triggers despite the violations (for example, to allow the application to run while you investigate), use foreign-keys enable without --strict.

Sequences are out of sync after a bulk insert

After any load that inserts explicit primary key values:

omop-alchemy reset-sequences          # all managed tables
omop-alchemy reset-sequences --vocab  # vocabulary tables only

reset-sequences sets each owned sequence to MAX(pk) + 1. It reports every table it touches and the old/new sequence positions.


Command reference

Command Purpose Key options Backend
info Inspect CLI readiness, backend, and dependency state --vocab All
doctor Read-only health check: connection, schema, FK state --deep, --vocab All (--deep PostgreSQL-focused)
data-summary Show managed tables and row counts --vocab, --include-missing All
reconcile-schema Compare ORM metadata vs live schema --vocab, --dry-run All
create-missing-tables Create OMOP tables absent from DB --dry-run, --no-vocab All
load-vocab-source Load Athena vocabulary CSVs --athena-source, --merge-strategy, --bulk-mode/--no-bulk-mode, --dry-run PostgreSQL, SQLite
truncate-tables Truncate selected tables --scope, --table, --yes, --cascade, --restart-identities PostgreSQL
reset-sequences Reset owned PK sequences to MAX(pk) + 1 --dry-run, --vocab PostgreSQL
foreign-keys disable Suspend FK RI trigger enforcement --vocab, --dry-run PostgreSQL
foreign-keys enable Re-enable FK RI trigger enforcement --strict, --vocab, --dry-run PostgreSQL
foreign-keys validate Report FK constraint violations --vocab PostgreSQL
foreign-keys status Show current trigger enable/disable state --vocab PostgreSQL
analyze-tables Refresh planner statistics --scope, --table, --vacuum PostgreSQL, SQLite (--vacuum PostgreSQL-only)
indexes disable Drop ORM-defined secondary indexes --vocab, --dry-run All
indexes enable Recreate ORM-defined secondary indexes --vocab, --dry-run All (cluster on PostgreSQL)
fulltext install Add tsvector sidecar columns to vocabulary tables --regconfig, --no-create-indexes PostgreSQL
fulltext populate Populate sidecar tsvector vectors --regconfig PostgreSQL
fulltext drop Remove tsvector sidecar columns and indexes PostgreSQL
backup-database Create a pg_dump backup artifact --output-path, --format, --dry-run PostgreSQL
restore-database Restore a backup artifact into the target DB --format (required), --dry-run PostgreSQL