cava_nlp.rule_engine.rule_engine¶
AGGREGATORS
module-attribute
¶
AGGREGATORS: dict[str, AggregatorAny] = {'max': agg_max, 'min': agg_min, 'join': agg_join, 'first': agg_first}
CASTERS
module-attribute
¶
CASTERS: dict[str, Callable[[Any], List[Any]]] = {'int': partial(safe_cast, int), 'float': partial(safe_cast, float), 'str': partial(safe_cast, str)}
RuleEngine ¶
Generic rule engine component.
Config (per instance):
- span_label: str # span-group name e.g "weight"
- entity_label: Optional[str] # span label, e.g. "WEIGHT"
- value_type: Optional[str] # type to cast value to: "int", "float", "str" - defaults string
- patterns: dict # spaCy Matcher patterns (outer list)
- patterns.value: Optional[float|str] # literal value to assign to matched span
- patterns.value_patterns: Optional[list] # patterns to extract numeric portion within span
- patterns.exclusions: Optional[list] # patterns to suppress spans
- merge_ents: Optional[bool] # whether to merge matched span into a single token
Create a rule-based extraction component.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nlp
|
Language
|
The spaCy Language object. Used to access the vocabulary and to construct matchers. |
required |
name
|
str
|
Name of this rule engine instance (pipeline component name). |
required |
config
|
Mapping[str, Any]
|
Configuration dictionary, typically loaded from YAML/JSON. |
required |
Expected
|
|
required |
ValueResolver ¶
Bases: Generic[Raw, Final]
Value resolution utilities for rule-based extraction.
This module provides a small, composable framework for turning raw extracted values (usually strings or lightweight objects produced by matchers) into final, typed values suitable for downstream use.
2-stage resolution process:
-
Aggregation: Combine zero or more raw values extracted from a span into a single representative raw value (e.g. max, min, first, join).
-
Casting: Convert the aggregated raw value into a final, typed value (e.g. int, float, str), handling failures safely.
This separation allows rules to express concepts like: - “take the maximum ECOG score mentioned” - “join multiple tokens into a single string” - “prefer literal values over extracted ones”
without baking domain logic into the rule engine itself.
A ValueResolver encapsulates the policy for turning a collection of
raw values into a single, meaningful result.
Resolution follows a strict priority order:
- Literal override – if a literal value is provided, it always wins.
- Aggregated extraction – raw values are aggregated, then cast.
- Fallback – used when no literal or extractable value is available.
resolve ¶
resolve(raw_values: Sequence[Raw], *, literal: Optional[Final] = None, fallback: Optional[Final] = None) -> Optional[Final]
Decide final value:
1) literal if provided 2) aggregated raw values if any 3) fallback if no extracted values