Backend Selection
omop-emb now has a backend abstraction layer so embedding storage and
retrieval can be selected explicitly instead of being inferred implicitly from
whatever happens to be installed.
Supported backend names
The current backend factory recognizes:
pgvector: The pgvector extension to a standard postgres database to store embeddings directly in the database.faiss: The FAISS storage solution for on-disk storage.
The default backend name is currently postgres.
Runtime selection
The intended pattern is:
- choose the backend at install time with package extras
- choose the backend again at runtime explicitly
Examples:
export OMOP_EMB_BACKEND=postgres
export OMOP_EMB_BACKEND=faiss
You can also pass the backend name directly in Python.
Python factory
The backend factory lives in omop_emb.backends:
from omop_emb.backends import get_embedding_backend
backend = get_embedding_backend("postgres")
backend = get_embedding_backend("faiss")
The factory currently exposes:
get_embedding_backend(...)normalize_backend_name(...)
Why explicit selection is necessary
Explicit backend selection improves clarity in a multi-backend world:
- users can see which backend they intended to use
- missing optional dependencies fail clearly
- the system avoids silent fallback between incompatible storage implementations
This is especially important when embeddings affect retrieval behavior, because silent fallback can make users think semantic retrieval is active when it is not.
Dependency errors
If a backend is requested but its optional dependencies are missing, the factory raises an explicit backend dependency error rather than falling back to another backend.
This is the intended behavior.
Examples of the error classes exposed by the backend layer:
EmbeddingBackendDependencyErrorUnknownEmbeddingBackendErrorEmbeddingBackendConfigurationError
Current scope
At the moment:
- the backend abstraction and backend factory exist
- PostgreSQL and FAISS backend classes exist
- the production CLI path still targets the PostgreSQL embedding workflow
- PostgreSQL-specific embedding dependencies are optional, but a database backend is still required for OMOP access and model registration
- model registration is intended to remain shared and database-backed even when FAISS is used for vector storage and retrieval
- database backends other than PostgreSQL have not yet been tested
So this page documents the selection model and Python interface shape now, even before every runtime path has been migrated to delegate through the backend factory.