Skip to content

Materialised Views

This module provides a SQLAlchemy-native pattern for defining, creating, refreshing, and orchestrating materialized views using normal Select constructs, with explicit dependency management and deterministic refresh order.

It is designed for:

  • analytics and reporting layers
  • large fact tables with repeated joins or aggregates
  • schema-level orchestration (migrations, setup, Airflow, admin tasks)

The implementation is PostgreSQL-oriented (due to materialized view support), but remains cleanly isolated from ORM persistence logic.

Overview

The materialized view system consists of four main parts:

  1. CreateMaterializedView: A custom SQLAlchemy DDLElement that compiles a Select into a CREATE MATERIALIZED VIEW IF NOT EXISTS statement.
  2. MaterializedViewMixin: A mixin used to define materialized views declaratively, including:
    • name
    • backing Select
    • optional dependencies
  3. Dependency resolution: A topological sort over declared dependencies to determine refresh order.
  4. Refresh orchestration: Helpers to refresh one or many materialized views safely and predictably.

Defining the Materialised View

Bases: DDLElement

CreateMaterializedView

SQLAlchemy DDL element representing a CREATE MATERIALIZED VIEW statement.

This custom DDL construct allows a SQLAlchemy Select construct to be compiled into a backend-specific CREATE MATERIALIZED VIEW statement, enabling materialized view creation to be expressed using SQLAlchemy's DDL execution model.

Parameters:

Name Type Description Default
name

Name of the materialized view to be created.

required
selectable

A SQLAlchemy Select construct defining the query backing the materialized view.

required

MaterializedViewMixin

Mixin providing materialized view lifecycle helpers.

Classes using this mixin must define:

  • __mv_name__: the name of the materialized view
  • __mv_select__: a SQLAlchemy Select defining the view contents
  • optionally, __mv_dependencies__: names of tables or materialized views this MV depends on

This mixin does not define ORM mappings; it is intended for schema-level helpers used during migrations, setup, or administrative workflows.

Examples:

class RecentObservationMV(MaterializedViewMixin):

    __mv_name__ = "mv_recent_observation"

    __mv_select__ = (
        select(
            Observation.observation_id,
            Observation.person_id,
            Observation.observation_date,
            Observation.value_as_number,
            Concept.concept_id,
            Concept.concept_name,
            Concept.domain_id,
        )
        .join(
            Concept,
            Observation.observation_concept_id == Concept.concept_id
        )
        .where(
            Observation.observation_date
            >= func.current_date() - text("INTERVAL '30 days'")
        )
    )

__mv_select__ is a normal SQLAlchemy Select. No special syntax required.

By combining with declarative base, you can define columns to query the mv as an object too:

daily_counts_select = (
    select(
        Observation.observation_date.label("observation_date"),
        Observation.observation_concept_id.label("concept_id"),
        sa.func.count().label("n_observations"),
        sa.func.row_number().over().label('mv_id')
    )
    .group_by(
        Observation.observation_date,
        Observation.observation_concept_id,
    )
)

class DailyObservationCountsMV(Base, MaterializedViewMixin):

    __mv_name__ = "mv_daily_observation_counts"
    __mv_select__ = daily_counts_select
    __mv_pk__ = ["mv_id"]
    __table_args__ = {"extend_existing": True}
    __tablename__ = __mv_name__

    __mv_dependencies__ = {
        "observation",
        "concept",
    }

    mv_id = sa.Column(primary_key=True)
    observation_date = sa.Column(sa.Date, nullable=False)
    concept_id = sa.Column(sa.Integer, nullable=False)
    n_observations = sa.Column(sa.Integer, nullable=False)
Query like a normal mapped class:

rows = (
    session.query(DailyObservationCount)
    .filter(DailyObservationCount.observation_date >= date(2025, 1, 1))
    .order_by(DailyObservationCount.n_observations.desc())
    .all()
)

Best practices

  • No inserts / updates
  • Composite PK required for ORM identity map
  • Treat as immutable cache

__mv_dependencies__ = set() class-attribute instance-attribute

__mv_name__ instance-attribute

__mv_select__ instance-attribute

create_mv(bind) classmethod

Create the materialized view if it does not already exist.

Parameters:

Name Type Description Default
bind

A SQLAlchemy Engine or Connection used to execute the DDL.

required
Notes

The underlying SQL is emitted via a custom DDL element and executed directly against the database. This operation is not transactional on all backends.

Examples:

with engine.begin() as conn:
    RecentObservationMV.create_mv(conn)

This emits SQL equivalent to:

CREATE MATERIALIZED VIEW IF NOT EXISTS mv_recent_observation AS
SELECT
    observation.observation_id,
    observation.person_id,
    observation.observation_date,
    observation.value_as_number
FROM observation
WHERE observation.observation_date >= CURRENT_DATE - INTERVAL '30 days';

refresh_mv(bind) classmethod

Refresh the contents of the materialized view.

Parameters:

Name Type Description Default
bind

A SQLAlchemy Engine or Connection used to execute the refresh.

required
Notes

This method issues a REFRESH MATERIALIZED VIEW statement and assumes backend support (e.g. PostgreSQL). Concurrent refresh semantics are not handled here.

Examples:

with engine.begin() as conn:
    RecentObservationMV.refresh_mv(conn)

resolve_mv_refresh_order

Resolve materialized view refresh order using topological sort.

Raises:

Type Description
RuntimeError

If a dependency cycle is detected.

refresh_all_mvs

Handle refreshing multiple materialized views in dependency order.

Examples:

    ALL_MVS = [
        ObservationWithConceptMV,
        DailyObservationCountsMV,
    ]

    refresh_all_mvs(engine, ALL_MVS)