Loaders¶

The orm_loader.loaders module provides conservative, schema-aware file ingestion infrastructure for loading external data into ORM-backed staging tables.

This subsystem is designed to handle:

untrusted or messy source files
large datasets requiring chunked processing
incremental and repeatable loads
dialect-specific optimisations (e.g. PostgreSQL COPY)
explicit, inspectable failure modes

Loaders are intentionally infrastructure-only: they do not embed domain rules or business semantics.

Core concepts¶

LoaderContext¶

LoaderContext

A LoaderContext object carries all state required to load a single file:

target ORM table
database session
staging table
file path
operational flags (chunking, deduplication, normalisation)

This makes loader behaviour explicit, testable, and side-effect free.

LoaderInterface¶

LoaderInterface

All loaders implement a common interface:

orm_file_load(ctx) — orchestrates file ingestion
dedupe(data, ctx) — defines deduplication semantics

Concrete implementations differ only in how data is read and processed, not in how it is staged.

Staging tables¶

Loaders always write to staging tables, never directly to production tables.

This allows:

safe rollback
repeatable merges
database-level deduplication
bulk loading optimisations

Final merge semantics are handled by the table mixins, not by loaders.

Provided loaders¶

Loader	Use case
`PandasLoader`	Flexible, debuggable CSV ingestion
`ParquetLoader`	High-volume, columnar ingestion

Both loaders share the same lifecycle and guarantees.

Loader lifecycle¶

Detect file format and encoding
Read data in chunks or batches
Optionally normalise to ORM column types
Optionally deduplicate (internal and/or database-level)
Insert into staging table
Return row count

No implicit commits or merges occur at this layer.

Guarantees¶

The loaders subsystem guarantees:

deterministic ingestion behaviour
no silent data loss
explicit logging of dropped or malformed rows
isolation from domain-specific rules

It does not guarantee correctness of the source data.

Helpers