Relationships & Reference Contexts¶
OMOP Alchemy takes a deliberately conservative approach to ORM relationships.
Rather than eagerly wiring every foreign key into a bidirectional relationship, it distinguishes between:
- structural foreign keys (always present in tables)
- reference lookups (read-only joins for navigation)
- analytical relationships (used in views, not ETL)
This separation keeps core tables simple, predictable, and fast to load, while still enabling rich, expressive navigation when you need it.
Why relationships are handled carefully¶
The OMOP CDM has several characteristics that make naïve ORM relationships risky:
- very long, mixed-use tables
- many optional foreign keys
- polymporphism
- frequent joins to large vocabulary tables
- mixed use cases (ETL vs analytics vs inspection)
In particular:
- ETL workflows should not accidentally trigger joins
- reference data should be navigable but not mutable
- analytical helpers should not pollute base table definitions
OMOP Alchemy addresses this by introducing Reference Contexts.
ReferenceContext
A helper base class for defining read-only reference relationships.
This class is purely structural: it resolves foreign keys into reference tables (Domain, Vocabulary, ConceptClass, etc.) with explicit join conditions.
These relationships are:
viewonly=True- explicitly joined
- resolved lazily using
selectin - defined outside the core table
They are intended for:
- inspection
- analytics
- debugging
- view-level navigation — not for ETL or mutation.
The core idea¶
Instead of defining relationships directly on a table class, OMOP Alchemy encourages a three-layer pattern:
- Table – structural definition only
- Context – reference relationships
- View – analytical behavior and validation
This keeps concerns cleanly separated.
Worked example: Person¶
1. The base table¶
The core Person table defines:
- primary keys
- scalar fields
- foreign key columns
- no ORM relationships
@cdm_table
class Person(CDMTableBase, Base, HealthSystemContext):
__tablename__ = "person"
person_id: Mapped[int] = mapped_column(primary_key=True)
year_of_birth: Mapped[int] = required_int()
gender_concept_id: Mapped[int] = required_concept_fk()
race_concept_id: Mapped[int] = required_concept_fk()
ethnicity_concept_id: Mapped[int] = required_concept_fk()
location_id: Mapped[Optional[int]] = mapped_column(
ForeignKey("location.location_id"),
nullable=True,
index=True,
)
At this layer:
- the table is easy to reason about
- loading rows never triggers joins
- nothing is implicitly navigable
This is the class you want in ETL loops.
2. The reference context¶
Reference relationships are defined separately in a Context class:
class PersonContext(ReferenceContext):
gender = ReferenceContext._reference_relationship(
target="Concept",
local_fk="gender_concept_id",
remote_pk="concept_id",
)
location = ReferenceContext._reference_relationship(
target="Location",
local_fk="location_id",
remote_pk="location_id",
)
Key properties of these relationships:
- viewonly=True
- no backrefs
- explicit join conditions
- safe to compose
3. The analytical view¶
Finally, the PersonView composes everything together:
class PersonView(Person, PersonContext, DomainValidationMixin):
__tablename__ = "person"
__mapper_args__ = {"concrete": False}
__expected_domains__ = {
"gender_concept_id": ExpectedDomain("Gender"),
"race_concept_id": ExpectedDomain("Race"),
}
This is the class you use for:
- analytics
- cohort logic
- interactive inspection
- debugging
It is intentionally not the class you use for bulk loading.
Why reference relationships are view-only¶
Reference relationships are declared as:
viewonly=True- no cascade rules
- no persistence semantics
This is intentional.
Vocabulary tables (Concept, Domain, Vocabulary, etc.) are:
- shared
- stable
- not owned by fact tables
Allowing mutation through ORM relationships would blur those boundaries and make ETL behavior harder to reason about.
Performance considerations¶
Reference relationships use lazy="selectin"
This provides a good balance:
- avoids N+1 queries in most cases
- avoids eager joins during row loading
- keeps behaviour predictable
If you need tighter control, you can always override loading strategies in queries.