cava_nlp.rule_engine.rule_engine¶

AGGREGATORS `module-attribute` ¶

AGGREGATORS: dict[str, AggregatorAny] = {'max': agg_max, 'min': agg_min, 'join': agg_join, 'first': agg_first}

CASTERS `module-attribute` ¶

CASTERS: dict[str, Callable[[Any], List[Any]]] = {'int': partial(safe_cast, int), 'float': partial(safe_cast, float), 'str': partial(safe_cast, str)}

RuleEngine ¶

RuleEngine(nlp: Language, name: str, config: RuleEngineConfig)

Generic rule engine component.

Config (per instance):

span_label: str # span-group name e.g "weight"
entity_label: Optional[str] # span label, e.g. "WEIGHT"
value_type: Optional[str] # type to cast value to: "int", "float", "str" - defaults string
patterns: dict # spaCy Matcher patterns (outer list)
patterns.value: Optional[float|str] # literal value to assign to matched span
patterns.value_patterns: Optional[list] # patterns to extract numeric portion within span
patterns.exclusions: Optional[list] # patterns to suppress spans
merge_ents: Optional[bool] # whether to merge matched span into a single token

Create a rule-based extraction component.

Parameters:

Name	Type	Description	Default
`nlp`	`Language`	The spaCy Language object. Used to access the vocabulary and to construct matchers.	required
`name`	`str`	Name of this rule engine instance (pipeline component name).	required
`config`	`Mapping[str, Any]`	Configuration dictionary, typically loaded from YAML/JSON.	required
`Expected`			required

ValueResolver ¶

ValueResolver(caster: Caster[Raw, Final], aggregator: Aggregator[Final])

Bases: Generic[Raw, Final]

Value resolution utilities for rule-based extraction.

This module provides a small, composable framework for turning raw extracted values (usually strings or lightweight objects produced by matchers) into final, typed values suitable for downstream use.

2-stage resolution process:

Aggregation: Combine zero or more raw values extracted from a span into a single representative raw value (e.g. max, min, first, join).
Casting: Convert the aggregated raw value into a final, typed value (e.g. int, float, str), handling failures safely.

This separation allows rules to express concepts like: - “take the maximum ECOG score mentioned” - “join multiple tokens into a single string” - “prefer literal values over extracted ones”

without baking domain logic into the rule engine itself.

A ValueResolver encapsulates the policy for turning a collection of raw values into a single, meaningful result.

Resolution follows a strict priority order:

Literal override – if a literal value is provided, it always wins.
Aggregated extraction – raw values are aggregated, then cast.
Fallback – used when no literal or extractable value is available.

resolve ¶

resolve(raw_values: Sequence[Raw], *, literal: Optional[Final] = None, fallback: Optional[Final] = None) -> Optional[Final]

Decide final value:

1) literal if provided 2) aggregated raw values if any 3) fallback if no extracted values