Enrichers🔗
Post-evaluation enrichers run after engine.evaluate() produces a ProcessResult and before each result is serialized to a sink. They inject contextual data — asset info, IP reputation, identity, GeoIP, runbook URLs — into the enrichments.<field> map on each detection or correlation, so every downstream consumer (RSoar, Grafana, Loki, custom scripts) sees the same structured context without re-fetching it.
This page covers what to put in --enrichers <path>, how the four primitives compose into the IRQL enrich_<keyfield>_<target> recipe catalog, and how to promote a recipe to a Rust-coded named enricher when one of the four primitives is not enough. The CLI flag itself is documented under engine daemon; per-call Prometheus metrics live in Prometheus metrics.
Why post-evaluation, not in-pipeline🔗
Processing pipelines (-p) run before the engine evaluates a rule: they map field names, filter events, and inject literal values. Enrichers run after the engine has already decided that a detection or correlation fired. The two surfaces are complementary:
- Use a pipeline when you want to normalise field names, drop noisy events, or inject a fixed list (threat-intel IOCs, allow-list users) into rule conditions. The pipeline runs once per event.
- Use an enricher when you want to attach context to a firing detection (asset owner, IP reputation, runbook URL). The enricher runs once per match, not once per event, and its work never re-affects rule evaluation.
This split keeps the evaluation hot path independent of any external system: a downstream HTTP outage cannot stop the engine from emitting detections, only from enriching them.
Config schema🔗
Pass a YAML file to --enrichers <path> on engine daemon. The file is hot-reloaded on SIGHUP, file-watcher changes, and POST /api/v1/reload; a reload that fails validation logs the error and keeps the previous pipeline active, so a typo never silently degrades production to "no enrichment".
# Bound on concurrent enrichment chains. Defaults to 16.
max_concurrent_enrichments: 16
enrichers:
- id: <unique-string> # required, used as a Prometheus label
kind: detection | correlation # required, see "Kind and template namespaces"
type: template | lookup | http | command # required, the primitive
inject_field: <field-name> # required, key under enrichments.<...>
timeout: 5s # optional, humantime; default 5s
on_error: skip | null | drop # optional; default skip
scope: # optional; see "Scope filtering"
rules: [<rule-id-or-glob>, ...]
tags: [<tag-or-prefix.*>, ...]
levels: [low, medium, high, critical, informational]
# ... primitive-specific fields below ...
Kind and template namespaces🔗
Every enricher declares a kind: detection | correlation. The kind drives two checks:
- Config-load-time template validation. A
kind: detectionenricher may only reference${detection.*}variables in its templated fields; akind: correlationenricher may only reference${correlation.*}. Cross-namespace references are rejected at startup with a clear error pointing at the offending field.${ENV_VAR}is allowed in both namespaces. - Runtime body matching. The pipeline skips enrichers whose declared kind does not match the current
EvaluationResultbody variant before invokingenrich(), so a detection-kind enricher pays no cost on correlation results and vice versa.
Detection variables (${detection.*}):
| Variable | Resolves to |
|---|---|
${detection.rule.title} / .id / .level | Rule metadata from RuleHeader |
${detection.tags} | Comma-joined tags |
${detection.fields.<name>} | The matched value of <name> from matched_fields |
${detection.event.<dotted.path>} | JSON path into the original event (when rsigma.include_event: "true" on the rule) |
Correlation variables (${correlation.*}):
| Variable | Resolves to |
|---|---|
${correlation.rule.title} / .id / .level | Rule metadata from RuleHeader |
${correlation.tags} | Comma-joined tags |
${correlation.type} | event_count, temporal, value_sum, ... |
${correlation.aggregated_value} | The value that crossed the condition threshold |
${correlation.timespan_secs} | Window size in seconds |
${correlation.group_key.<field>} | Look up a group-by field by name |
${correlation.group_key} | Joined field=value,field=value string |
For enrichers that conceptually apply to both kinds (identity lookups, runbook URLs, any tag-based enricher), declare two YAML entries with the same type and inject_field but different kind and template namespaces.
Scope filtering🔗
scope limits when an enricher fires within its declared kind:
scope.rules: list of rule IDs (exact match) or rule-title globs (Suspicious *)scope.tags: tag-set intersection with prefix wildcards (attack.*matchesattack.t1059.001)scope.levels: severity membership againstRuleHeader::level- No scope = fires for every result of the enricher's declared kind (use for cheap enrichers like
template)
There is no scope.kinds axis: the top-level kind already gates which result variant the enricher sees. Axes are AND-ed; an empty axis is not a filter.
The four primitives🔗
template: pure string interpolation🔗
Cheapest primitive. No I/O. Cannot fail past config-load-time template parse errors.
- id: runbook_det
kind: detection
type: template
inject_field: runbook_url
template: "https://wiki.internal/runbooks/${detection.rule.id}"
lookup: read from the dynamic-pipelines source cache🔗
Reads a value from the dynamic-pipelines source cache by source_id and applies an extract expression (jq / JSONPath / CEL) with template-expanded variables to slice it. Zero-network-cost for anything already loaded as a dynamic source.
- id: asset_context_corr
kind: correlation
type: lookup
inject_field: asset_context
source: asset_inventory # id of a dynamic source configured on the daemon
extract: '.assets[] | select(.hostname == "${correlation.group_key.HostName}")'
extract_type: jq # jq | jsonpath | cel; defaults to jq
default: "unknown" # injected on cache miss / no extract match
on_error: skip # applied only when default is not configured
The decision matrix:
- Cache hit + extract matches → inject the extracted value
- Cache hit + no extract match → if
defaultis configured, inject it; otherwise applyon_error - Cache miss → if
defaultis configured, inject it; otherwise applyon_error - Extract evaluation error (invalid jq, type mismatch) → always applies
on_error, even withdefaultset
lookup requires at least one dynamic source to be configured on the daemon via --source <file> (or via the deprecated pipeline-embedded sources: block). The loader surfaces a clear error at startup if a lookup enricher is configured without a source cache.
http: per-result HTTP fetch with optional response cache🔗
Per-result reqwest request with template-expanded URL, headers, and optional body. Parses the response as JSON, optionally sliced by an extract expression. The optional response cache is keyed on (method, url, body_hash) with a configurable TTL; mandatory in practice for any rate-limited API.
- id: hash_virustotal
kind: detection
type: http
inject_field: file_reputation
url: "https://www.virustotal.com/api/v3/files/${detection.fields.SHA256}"
method: GET # default GET
headers:
x-apikey: "${VIRUSTOTAL_API_KEY}"
cache_ttl: 1h # mandatory for the 4 req/min free tier
extract: ".data.attributes.last_analysis_stats"
extract_type: jq
on_error: skip
scope:
tags: ["attack.execution", "attack.defense_evasion"]
Each enricher instance owns its own response cache: two enrichers hitting the same URL with different Authorization headers do not share entries.
command: per-result local-process execution🔗
Per-result tokio::process::Command invocation with template-expanded argv and environment. Stdout is captured (capped at 10 MB) and parsed as JSON or as a raw string. Non-zero exit codes map to a fetch error with a snippet of stderr attached.
- id: ip_reputation
kind: detection
type: command
inject_field: ip_reputation
command:
- "/usr/local/bin/check-ip-rep"
- "${detection.fields.SourceIp}"
env:
REP_LOCAL_DB: "/var/lib/iprep.db"
output: json # json (default) | raw
timeout: 3s
on_error: skip
Composing enrichers (recipes)🔗
The four primitives cover almost every operational use case via composition. Recipes are field-parametric: substitute the field names your pipeline actually produces (SourceIp, cip, client.ip, ClientIp, ...) for the placeholders below.
enrich_ip_employee — identity lookup by source IP🔗
sources:
employee_directory:
type: file
path: /etc/rsigma/employees.json
format: json
extract:
expr: 'with_entries(.value |= {user: .user, team: .team})'
type: jq
enrichers:
- id: enrich_ip_employee
kind: detection
type: lookup
inject_field: employee
source: employee_directory
extract: '."${detection.fields.SourceIp}"'
extract_type: jq
default: "unknown"
scope:
levels: [high, critical]
Expected enrichments.employee shape: {"user": "alice", "team": "Platform"} or "unknown" on miss.
enrich_username_employee — identity lookup by username🔗
Same source as above, key by username instead.
enrich_ip_geoip — country/city/ASN by IP🔗
Prefer lookup if a GeoIP dump fits in memory; fall back to http for vendor APIs.
enrich_hash_virustotal — hash reputation with cache🔗
cache_ttl is mandatory for the 4 req/min free tier and a major win for duplicate-detection bursts on any tier. See the YAML in the http example above.
enrich_cve_kev — known-exploited-vulnerability flag🔗
Pulls the CISA KEV catalog as a dynamic-pipelines source, then flags CVEs that appear in it.
sources:
kev_catalog:
type: http
url: https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json
format: json
refresh: { interval: 3600s }
enrichers:
- id: enrich_cve_kev
kind: detection
type: lookup
inject_field: kev
source: kev_catalog
extract: '.vulnerabilities[] | select(.cveID == "${detection.fields.CveId}")'
extract_type: jq
default: null
enrich_url_runbook — synthesised runbook URL🔗
Pure string interpolation, no I/O. Use this any time a downstream consumer (Slack, PagerDuty, RSoar) needs a per-detection link.
- id: enrich_url_runbook
kind: detection
type: template
inject_field: runbook_url
template: "https://wiki.internal/runbooks/${detection.rule.id}"
When to pick which primitive🔗
- Prefer
lookupif the data is bounded and refreshes infrequently (employee directory, KEV catalog, GeoIP dump). - Prefer
httponly when the data is genuinely per-result or too large to cache. Always setcache_ttlfor rate-limited APIs. - Prefer
commandonly when no other primitive will do (a binary parser, a vendored CLI tool, an existing script). - Never use
templatefor anything that could be a YAML literal.
Promoting a recipe to a bespoke enricher🔗
The four primitives cover almost every use case via composition. A bespoke Rust-coded enricher is justified only when at least one of these holds:
- It bundles non-trivial data (a dataset committed to the repo and
include_bytes!-ed at compile time). Recipes can't express vendored data. - It needs a parser the YAML primitives don't expose (e.g. MaxMind's binary GeoLite2 format, the STIX 2.1 graph with parent/child resolution). Adding the parser as a generic source might cost more than just shipping the enricher.
- It provides a stable named contract: downstream consumers reference a specific
enrichments.<field>shape directly. A recipe-driven approach lets every operator pick their owninject_field, which is fine for ad-hoc enrichment but bad for a contract that crosses team or organisational boundaries. - It implements a non-obvious algorithm (e.g. coalescing per-result hash lookups into one batched-GET request). This is implementable as a recipe but the implementation is fragile.
External crates wire a bespoke type via register_builtin(name, factory):
use rsigma_runtime::{Enricher, register_builtin};
register_builtin(
"enrich_my_thing",
std::sync::Arc::new(|raw_config: &serde_json::Value| -> Result<Box<dyn Enricher>, String> {
let cfg: MyConfig = serde_json::from_value(raw_config.clone()).map_err(|e| e.to_string())?;
Ok(Box::new(MyEnricher::new(cfg)))
}),
).unwrap();
Reserved names (template, lookup, http, command) are rejected at registration time; duplicate registrations of the same name are rejected to keep the global registry append-only. Bespoke types follow the same kind / scope / template rules as the four primitives; promotion does not change the YAML shape, only the type: value.
Output shape🔗
The pipeline writes into RuleHeader::enrichments lazily, so detections and correlations that no enricher touched still serialize without an empty enrichments object. A typical NDJSON line looks like:
{"rule_title":"Suspicious PowerShell encoded command","rule_id":"rule-pwsh-enc","level":"high","tags":["attack.t1059.001"],"matched_selections":["selection"],"matched_fields":[{"field":"CommandLine","value":"powershell -enc ..."}],"enrichments":{"asset_info":{"hostname":"dc01","owner":"IT-Ops"},"runbook_url":"https://wiki.internal/runbooks/rule-pwsh-enc"}}
Metrics🔗
| Metric | Labels | Description |
|---|---|---|
rsigma_enrichment_total | enricher_id, kind, status | Per-call outcome counter; status is success / skip / error / timeout / drop |
rsigma_enrichment_duration_seconds | enricher_id, kind | Per-enricher latency histogram |
rsigma_enrichment_queue_depth | – | Pending enrichment calls (sum across both kinds) |
rsigma_enrichment_http_cache_hits_total | enricher_id | HTTP enricher response-cache hits |
rsigma_enrichment_http_cache_misses_total | enricher_id | HTTP enricher response-cache misses |
rsigma_enrichment_http_cache_expirations_total | enricher_id | HTTP enricher response-cache entries evicted on expiry |
Every (enricher_id, kind, status) triple and every HTTP-cache enricher_id row is pre-registered at startup, so all six families render at zero on the first /metrics scrape, before any event has fired. Filtered (kind- or scope-mismatched) calls do not increment any counters.
See also🔗
engine daemonreference for the--enrichersflag.- Dynamic Pipeline Sources for the source cache that
lookupreads from. - Prometheus metrics for the full metric definitions.
rsigma-runtimefor theEnrichertrait,EnrichmentPipeline, andregister_builtinAPI.