Knowledge Graph Facade
The KnowledgeGraph class is the primary interface for interacting with the OMOP Common Data Model (CDM) as a graph. It implements a Virtual Knowledge Graph (VKG) layer, providing a high-level, object-oriented facade over relational database tables.
Rationale
While the OMOP CDM is stored in a Relational Database Management System (RDBMS), its vocabulary structure (concepts, relationships, and hierarchies) is inherently graph-based. However, querying these structures using standard SQL often requires complex joins and recursive logic that can be difficult to maintain and interpret.
omop-graph bridges this gap by:
- Virtualization: Operating directly on existing RDBMS tables without requiring a separate graph database (like Neo4j), ensuring compatibility with standard OHDSI deployments.
- Information Retrieval: Enabling sophisticated graph traversal (parents, children, ancestors) and semantic search which are critical for concept grounding and medical entity linking.
- Abstraction: Providing a deterministic framework for validating medical logic through a Pythonic API, hiding the underlying SQL complexity.
Key Features
- SQLAlchemy Integration: Efficiently manages database sessions and executes optimized queries against the CDM.
- LRU Caching: Implements high-performance caching for frequent lookups, such as concept IDs, labels, and predicates, to minimize database round-trips.
- Semantic Predicates: Resolves standard OMOP relationship IDs into rich
Predicateobjects that understand hierarchy and directionality. See here for more information - Flexible Search: Supports exact matches, fuzzy
ILIKEsearches, and full-text search (bag-of-words) across concept names and synonyms. See documentation for more information - Graph Traversal: Simple methods to retrieve
edges,parents,children,roots, andleaves. - Extensibility: Includes a dedicated namespace for embedding-based operations (requires
omop-emb- see Installation instructions for more information).
Basic Usage
The KnowledgeGraph can be used standalone after connecting to the OMOP CDM database on disk.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from omop_graph.graph.kg import KnowledgeGraph
# Setup your SQLAlchemy session
engine = create_engine("postgresql://user:pass@localhost/omop")
SessionLocal = sessionmaker(bind=engine)
# Initialize the Virtual Knowledge Graph
kg = KnowledgeGraph(SessionLocal)
# Lookup a concept by its label
match_group = kg.label_lookup("Atrial Fibrillation", fuzzy=False)
concept = match_group.best_match
print(f"ID: {concept.concept_id}, Name: {concept.matched_label}")
# Traverse the hierarchy
parents = kg.parents(concept.concept_id)
print(f"Parent IDs: {parents}")
Embedding Configuration
To enable semantic similarity and RAG-based retrieval, pass a KnowledgeGraphEmbeddingConfiguration when initialising the graph.
This requires the optional omop-emb package — see the installation guide.
Read-only (pre-computed embeddings already in the DB)
Use this when embeddings have already been indexed and you only need retrieval:
from omop_graph.graph.kg import KnowledgeGraph, KnowledgeGraphEmbeddingConfiguration
from omop_emb import BackendType, ProviderType
emb_config = KnowledgeGraphEmbeddingConfiguration(
backend_type=BackendType.FAISS,
provider_type=ProviderType.OLLAMA,
canonical_model_name="text-embedding-3-small:0.6b",
base_storage_dir="/data/embeddings",
)
kg = KnowledgeGraph(SessionLocal, emb_config=emb_config)
Write-capable (generate and store embeddings at runtime)
Provide an EmbeddingClient to enable both reading and writing embeddings:
from omop_emb import EmbeddingClient
from omop_emb import BackendType, ProviderType
client = EmbeddingClient(...) # configured for your provider
emb_config = KnowledgeGraphEmbeddingConfiguration(
backend_type=BackendType.FAISS,
base_storage_dir="/data/embeddings",
client=client,
)
kg = KnowledgeGraph(SessionLocal, emb_config=emb_config)
provider_type will be automatically determined from the client.
Fallback embedding calculation
When some concepts in the OMOP DB have not been pre-indexed, similarity scoring will silently skip them.
Setting compute_missing_embeddings=True instructs the graph to compute and persist embeddings
for any missing concepts on-the-fly during a similarity call.
Warning
This flag has no effect unless a write-capable interface is configured (i.e. a client is provided).
Without a client, the graph holds a read-only interface and cannot write back to the embedding store.
emb_config = KnowledgeGraphEmbeddingConfiguration(
backend_type="faiss",
base_storage_dir="/data/embeddings",
client=client,
compute_missing_embeddings=True, # compute embeddings for concepts not yet in the store
)
kg = KnowledgeGraph(SessionLocal, emb_config=emb_config)
compute_missing_embeddings |
client present |
Behaviour when concepts are missing |
|---|---|---|
False (default) |
any | Log at INFO and skip missing concepts in scoring |
True |
no | Log warning that computation is not possible; skip missing concepts |
True |
yes | Compute embeddings, persist to DB, then score |