Skip to content

cava_nlp.rule_engine.rule_engine

AGGREGATORS module-attribute

AGGREGATORS: dict[str, AggregatorAny] = {'max': agg_max, 'min': agg_min, 'join': agg_join, 'first': agg_first}

CASTERS module-attribute

CASTERS: dict[str, Callable[[Any], List[Any]]] = {'int': partial(safe_cast, int), 'float': partial(safe_cast, float), 'str': partial(safe_cast, str)}

RuleEngine

RuleEngine(nlp: Language, name: str, config: RuleEngineConfig)

Generic rule engine component.

Config (per instance):

  • span_label: str # span-group name e.g "weight"
  • entity_label: Optional[str] # span label, e.g. "WEIGHT"
  • value_type: Optional[str] # type to cast value to: "int", "float", "str" - defaults string
  • patterns: dict # spaCy Matcher patterns (outer list)
  • patterns.value: Optional[float|str] # literal value to assign to matched span
  • patterns.value_patterns: Optional[list] # patterns to extract numeric portion within span
  • patterns.exclusions: Optional[list] # patterns to suppress spans
  • merge_ents: Optional[bool] # whether to merge matched span into a single token

Create a rule-based extraction component.

Parameters:

Name Type Description Default
nlp Language

The spaCy Language object. Used to access the vocabulary and to construct matchers.

required
name str

Name of this rule engine instance (pipeline component name).

required
config Mapping[str, Any]

Configuration dictionary, typically loaded from YAML/JSON.

required
Expected
required

ValueResolver

ValueResolver(caster: Caster[Raw, Final], aggregator: Aggregator[Final])

Bases: Generic[Raw, Final]

Value resolution utilities for rule-based extraction.

This module provides a small, composable framework for turning raw extracted values (usually strings or lightweight objects produced by matchers) into final, typed values suitable for downstream use.

2-stage resolution process:

  1. Aggregation: Combine zero or more raw values extracted from a span into a single representative raw value (e.g. max, min, first, join).

  2. Casting: Convert the aggregated raw value into a final, typed value (e.g. int, float, str), handling failures safely.

This separation allows rules to express concepts like: - “take the maximum ECOG score mentioned” - “join multiple tokens into a single string” - “prefer literal values over extracted ones”

without baking domain logic into the rule engine itself.

A ValueResolver encapsulates the policy for turning a collection of raw values into a single, meaningful result.

Resolution follows a strict priority order:

  1. Literal override – if a literal value is provided, it always wins.
  2. Aggregated extraction – raw values are aggregated, then cast.
  3. Fallback – used when no literal or extractable value is available.

resolve

resolve(raw_values: Sequence[Raw], *, literal: Optional[Final] = None, fallback: Optional[Final] = None) -> Optional[Final]

Decide final value:

1) literal if provided 2) aggregated raw values if any 3) fallback if no extracted values