Skip to content

Loader Context and Diagnostics

This page documents the shared data structures used by the loaders subsystem.


LoaderContext

LoaderContext is an immutable coordination object passed through all loader operations.

It encapsulates all state required to load a file without relying on globals or implicit configuration.

Fields

Field Description
tableclass Target ORM table class
session Active SQLAlchemy session
path Path to the input file
staging_table SQLAlchemy Table used for staging
chunksize Optional chunk size
normalise Whether to cast values to ORM types
dedupe Whether to deduplicate incoming data
dedupe_incl_db Whether to dedupe against existing DB rows

Immutable context object passed through loader operations.

The LoaderContext encapsulates all state required to load a file into a staging table, without relying on global variables.

Attributes:

Name Type Description
tableclass Type[CSVTableProtocol]

ORM table class being loaded.

session Session

Active SQLAlchemy session.

path Path

Path to the source data file.

staging_table Table

SQLAlchemy Table object representing the staging table.

chunksize int | None

Optional chunk size for incremental loading.

normalise bool

Whether to apply type casting / normalisation.

dedupe bool

Whether to perform deduplication.

dedupe_incl_db bool

Whether deduplication should include existing database rows.


Casting statistics

Loaders track casting failures during normalisation to support debugging and auditability.

ColumnCastingStats

Tracks casting failures for a single column:

  • total failure count
  • representative example values

TableCastingStats

Aggregates column-level statistics for a table.

This enables loaders to emit warnings such as:

  • unexpected nulls in required columns
  • values that could not be cast to target types

Statistics are logged, not raised as exceptions.

Casting statistics for a single column.

Aggregated casting statistics for a table, keyed by column.

total_failures property

Total number of casting failures.

has_failures()

Whether any casting failures occurred.

record(*, column, value, example_limit=3)

Record a casting failure for a column.

to_dict()

Return a dictionary representation of casting statistics.


Design notes

  • Casting failures do not abort loads
  • Rows violating required constraints are dropped explicitly
  • Examples are capped to avoid log flooding

This design favours observability over strict enforcement.