Skip to content

Schema & Instances

File layout

The package ships two asset directories:

src/omop_semantics/schema/
├── configuration/          # LinkML schema definitions
│   ├── core/               # Base primitives, profiles, templates, named-set schema
│   ├── registry/           # Registry and fragment structures
│   └── profiles/           # Domain profile modules (staging, modifiers, episodes)
└── instances/              # YAML instance files (the authoring surface)
    ├── enumerators.yaml
    ├── valuesets.yaml
    ├── profiles.yaml
    ├── profile_groups.yaml
    ├── demographic.yaml
    ├── genomic.yaml
    └── provider_specialty.yaml

The instances/ directory is where the semantic conventions live. The configuration/ directory contains the LinkML schemas that validate them.

Instance files

enumerators.yaml

Defines the named enums and groups used by the value-set runtime. Each entry is an OmopEnum or OmopGroup with a name and its members. These are loaded into CDMSemanticUnits and indexed by name for use by the value-set compiler.

Example:

named_enumerators:
  - name: genomic_value_group
    class_uri: OmopEnum
    enum_members:
      - concept_id: 9191
        label: genomic_positive
      - concept_id: 9189
        label: genomic_negative

valuesets.yaml

Groups named enumerators into top-level runtime namespaces. Each value set entry has a name and a semantic_units list of enumerator names (string references resolved at load time).

valuesets:
  - name: genomic
    semantic_units:
      - genomic_value_group
      - genomic_mapped_types
  - name: staging
    semantic_units:
      - t_stage_concepts
      - n_stage_concepts
      - group_stage_concepts

These names become the top-level attributes on the runtime object: runtime.genomic, runtime.staging, etc.

profiles.yaml

Defines the CDM row shapes available to templates. Each profile names the target table, the concept slot, and (optionally) the value slot.

profiles:
  - name: observation_coded
    cdm_table: observation
    concept_slot: observation_concept_id
    value_slot: value_as_concept_id
  - name: measurement_numeric
    cdm_table: measurement
    concept_slot: measurement_concept_id
    value_slot: value_as_number

This file is the authoritative catalogue of built-in CDM profiles. Registry instance files refer to profiles by name and the engine resolves them from here at load time.

profile_groups.yaml

Defines named families of profiles for documentation and routing:

ObservationProfiles:
  class_uri: RegistryGroup
  role: observation
  members:
    - observation_simple
    - observation_coded
    - observation_string

These are available via engine.profile_runtime.list_groups() when the file is passed to profile_paths.

Registry instance files (demographic.yaml, etc.)

These define the actual semantic templates. Each file contains a groups: list of RegistryGroup entries, each with registry_members: — the templates.

Templates in these files refer to CDM profiles by name:

- name: Country of birth
  role: demographic
  entity_concept:
    class_uri: OmopGroup
    name: Country of Birth
    parent_concepts:
      - concept_id: 4155450
        label: Country of birth
  cdm_profile: observation_simple

The string observation_simple is resolved against profiles.yaml when loading through OmopSemanticEngine.from_yaml_paths().

Profile resolution at load time

When OmopSemanticEngine.from_yaml_paths() loads a registry file, it checks whether any template member uses a string-named cdm_profile. If so, it expands those names against the CDM profile catalogue before validating the file as a RegistryFragment. Files that already contain fully-expanded profile objects are loaded directly.

The catalogue used defaults to the shipped profiles.yaml. To substitute a custom catalogue, pass profiles_path to from_yaml_paths().

Adding your own definitions

To extend the shipped definitions with project-specific conventions, author your own instance files following the same structure and pass them to from_yaml_paths():

engine = OmopSemanticEngine.from_yaml_paths(
    registry_paths=[
        INSTANCE_DIR / "demographic.yaml",   # shipped
        Path("my_project/oncology.yaml"),     # project-specific
    ],
    profile_paths=[
        INSTANCE_DIR / "profile_groups.yaml",
    ],
)

Multiple registry files are merged into a single runtime registry. Template names must be unique across all files.

Regenerating the Pydantic models

The Pydantic models in schema/generated_models/ are generated from the LinkML schemas. If you change a schema, regenerate with:

omop-semantics gen-models

To check whether the committed models are in sync without writing:

omop-semantics gen-models --check

Run these commands with the project's own virtual environment (not a workspace-level environment) to ensure the correct linkml version is used.