OMOP Embeddings
omop-emb is an optional package to super-charge omop-graph and provide additional graph reasoning tools for information retrieval and RAG-based knowledge extraction.
The package currently supports:
- dynamic embedding model registration
- model metadata is stored locally in SQLite (
metadata.db) - multiple embedding models can be tracked per backend and index type
- embedding and lookup for OMOP concepts
- supports various storage backends
- pgvector: storage in the original OMOP database
- FAISS: on-disk vector storage and index files
- Extension to
omop-alchemyto support new tables - CLI scripts to add embeddings to an already existing OMOP CDM
Installation
Install the backend you actually want to use:
pip install "omop-emb[pgvector]"
pip install "omop-emb[faiss]"
pip install "omop-emb[all]"
A plain pip install omop-emb installs only the shared core package.
At runtime, backend choice should also be explicit. The intended direction is:
- install-time choice via extras
- runtime choice via config such as
OMOP_EMB_BACKEND=pgvectororOMOP_EMB_BACKEND=faissor passing it as an argument to the respective interface (e.g. see CLI reference)
Environment Variables
| Variable | Description | Details |
|---|---|---|
OMOP_EMB_BACKEND |
Backend to use: pgvector or faiss |
Embedding Storage |
OMOP_EMB_BASE_STORAGE_DIR |
Base directory for metadata.db and FAISS artifacts |
Installation |
OMOP_DATABASE_URL |
SQLAlchemy URL for the OMOP CDM database | Installation |
OMOP_EMB_DOCUMENT_EMBEDDING_PREFIX |
Task prefix prepended to concept texts at index time | Asymmetric Embeddings |
OMOP_EMB_QUERY_EMBEDDING_PREFIX |
Task prefix prepended to search queries at query time | Asymmetric Embeddings |
The prefix variables are optional and default to "". They are only needed for asymmetric embedding models (e.g. nomic-embed-text, E5, BGE) that require different task prefixes for indexing versus searching.
Important caveats
omop-embdepends on OMOP CDM database access for concept metadata and filtering.- Current operational and test coverage is PostgreSQL-focused. Extension planned in the future.
Documentation overview
- Installation
- EmbeddingInterface Guide — The primary API for embedding operations with model name validation
- Embedding storage backends
- CLI Reference