Alert Pipeline🔗
The alert pipeline is an optional post-engine stage in the daemon's output path, between post-evaluation enrichment and the sinks. It is modeled on the Prometheus Alertmanager processing pipeline and is strictly post-engine: it consumes EvaluationResults and emits EvaluationResults, so the evaluation hot path is untouched.
It is configured with a separate YAML file via --alert-pipeline <path> (or the daemon.alert_pipeline config key) and is hot-reloaded on SIGHUP, file-watcher changes, and POST /api/v1/reload; a failed reload keeps the previous pipeline active.
This page covers silencing, inhibition, deduplication, and incident grouping. The stages run in that order (the mute stages first): a muted result never dedups or opens an incident.
Silencing🔗
A silence mutes results matching a set of matchers for a time window, modeled on Alertmanager silences. A muted result is acked and dropped before dedup, so it neither emits nor contributes to an incident.
Silences come from two origins:
- static: declared in the
--alert-pipelineconfig undersilences:. Re-seeded on hot-reload (the previous static set is replaced); use these for maintenance-as-code. - api: created at runtime over
POST /api/v1/silences, independent of the config file. Use these for ad-hoc mutes during an incident.
Matchers🔗
A matcher is selector <op> value, where the left-hand side is a field selector and <op> is one of = (equals), != (not equals), =~ (regex match), !~ (regex no-match). Regex operators are anchored (full match). A matcher set is ANDed: every matcher must match. The same matcher engine backs inhibition.
Config (static silences)🔗
silences:
- matchers:
- selector: rule
op: "="
value: noisy-rule
- selector: level
op: "!="
value: critical
comment: "muted during migration"
created_by: ops
# starts_at / ends_at are optional RFC 3339 timestamps; absent means
# active immediately / never expires.
The silence API🔗
POST /api/v1/silencescreates a silence from a JSON body (matchers, optionalstarts_at/ends_atRFC 3339,created_by,comment) and returns the assignedid. It returns429 Too Many Requestsonce the dynamic-silence cap (max_silences, default 1000) is reached; delete silences or raise the cap.GET /api/v1/silenceslists every silence with its derivedstate(pending/active/expired) andorigin.DELETE /api/v1/silences/{id}removes a silence.
Expired silences are garbage-collected on the background tick. The dynamic (API) silence count is bounded by max_silences so an unbounded number of silences cannot accumulate; static silences from the config do not count against it. Metrics: rsigma_silenced_total (results muted) and rsigma_silences_active (currently-active silences).
Inhibition🔗
Inhibition mutes a target result while a matching source result is active, modeled on Alertmanager inhibit_rules. Each rule is { source_match, target_match, equal, duration }: while a result matching source_match has been seen within duration, any result matching target_match that shares the same equal selector values is muted. The classic use is letting a critical alert suppress the lower-severity high alerts for the same entity.
inhibit_rules:
- name: critical-inhibits-high
source_match:
- selector: level
op: "="
value: critical
target_match:
- selector: level
op: "="
value: high
equal:
- match.SourceIp
duration: 5m
source_match and target_match use the matcher engine; both are required. equal is a list of selectors whose values must match between source and target (an empty equal means any active source of the rule inhibits any target). duration (humantime, default 5m) is how long a source stays active after it was last seen.
Two behaviors follow Alertmanager:
- A self-inhibition guard: a result matching both
source_matchandtarget_matchdoes not inhibit itself. - Inhibition is non-transitive: a silenced source still inhibits its targets (the source index is updated before silencing), but an inhibited target does not become a source.
Metrics: rsigma_inhibited_total{rule} (results muted, by rule name) and rsigma_inhibit_sources_active (currently-active sources).
Deduplication🔗
Deduplication collapses repeated firings of the same logical alert into a single active alert, modeled on Alertmanager's dedup behavior rather than a silent drop.
Each in-scope result is reduced to a fingerprint: the rule identity plus the configured selector values. The first fire for a fingerprint passes through unchanged and opens an active alert. Subsequent fires within the window fold into that alert, incrementing its fire count and last_seen, instead of being emitted again. A periodic tick then:
- re-emits a still-active alert every
repeat_interval, carrying the accumulated fire count (setrepeat_interval: 0for pure suppression with no re-emits), and - emits a final
resolvedrecord onceresolve_timeoutelapses with no further fires, and evicts the alert.
Re-emit and resolved records are ordinary NDJSON results carrying a dedup_state key (repeat or resolved) in their enrichments, so they ride the existing sink path and wire shape. They also carry dedup_fingerprint, dedup_fire_count, dedup_first_seen, dedup_last_seen, and the resolved dedup_fields values.
Configuration🔗
# alert-pipeline.yml
strip_event: false
scope:
levels: [high, critical]
tags: [attack.*]
dedup:
fingerprint:
- rule
- match.SourceIp
repeat_interval: 1h # 0 disables re-emits (pure suppression)
resolve_timeout: 30m
| Key | Default | Description |
|---|---|---|
strip_event | false | Retain the event for selector resolution, then drop raw event payloads (detection event, correlation events / event_refs) before sink delivery. |
scope.rules / scope.tags / scope.levels | empty | Restrict which results the layer acts on. Out-of-scope results pass through untouched. Same syntax as the enrichers scope. |
dedup.fingerprint | required | Selectors hashed (with the rule identity) into the fingerprint. At least one is required when dedup is set. |
dedup.repeat_interval | 0 | Re-emit cadence for a still-active alert (humantime, e.g. 1h). 0 means pure suppression. |
dedup.resolve_timeout | 1h | Idle timeout after which an active alert resolves and is evicted (humantime). |
dedup.max_active_alerts | 100000 | Ceiling on concurrently-active alerts. Once reached, a first-fire for a new fingerprint passes through un-deduped instead of growing the store, so a high-cardinality fingerprint cannot exhaust memory. The rsigma_dedup_store_entries gauge plateauing at this value signals saturation. |
max_silences | 1000 | Ceiling on concurrently-tracked dynamic (API) silences. POST /api/v1/silences returns 429 past this. Static silences from the config do not count against it. |
Field selectors🔗
The fingerprint is built from selectors over the EvaluationResult namespace:
| Selector | Resolves to |
|---|---|
rule | the rule id, falling back to the rule title |
level | the severity, lowercased (high, critical, ...) |
event.<path> | a dotted path into the retained event JSON (requires a retained event; see below) |
match.<field> | a matched field value (detection results only) |
enrichment.<path> | a dotted path into enrichments (so enrichers can supply fingerprint fields) |
correlation.group_key.<field> | a group-by value (correlation results only) |
A selector that resolves to nothing contributes an explicit null marker to the fingerprint. A malformed selector rejects the daemon at startup with an error naming the offending selector.
Retained events and strip_event🔗
event.* selectors only resolve when the event is retained on the result (--include-event or per-rule rsigma.include_event). If you need to fingerprint on an event field but do not want full events in the output, set strip_event: true: the layer reads the event for selector resolution, then removes the raw event payload from each pass-through result before delivery.
Metrics🔗
The dedup stage exposes (see Metrics):
rsigma_dedup_results_total{action}— outcomes (emitted,folded,repeat,resolved).rsigma_dedup_store_entries— active alerts currently tracked.rsigma_dedup_evictions_total— alerts evicted after resolving.rsigma_dedup_summaries_emitted_total—repeatplusresolvedrecords emitted.rsigma_alert_pipeline_duration_seconds— stage duration.
Persistence🔗
When the daemon runs with --state-db <PATH>, the alert pipeline's state is persisted to the SQLite store alongside the correlation snapshot, in its own rsigma_alert_pipeline_state table. The snapshot carries the active dedup alerts, open incidents, the dynamic (API) silences, and the inhibition active-source index; static silences come from config and are re-seeded on boot. It is written periodically (--state-save-interval) and on graceful shutdown, and restored on boot.
Restore is window-aware: dedup alerts past resolve_timeout, incidents past their resolve_timeout, silences past ends_at, and inhibition sources past their rule's duration are dropped during restore, so stale state never lingers. Deterministic group_by incident ids are preserved across the restart. A snapshot whose version does not match the current build is ignored with a warning (the daemon starts fresh). --clear-state skips the restore (and --keep-state forces it), matching the correlation-state flags.
Relationship to rsigma.suppress🔗
Dedup here is a sink-path stage that applies to detection and correlation results alike. It is distinct from the correlation engine's per-rule rsigma.suppress, which is engine-side and applies only to correlation firings. The two can be used together.
Grouping🔗
Grouping collapses related results into incidents and emits a higher-level IncidentResult on the Alertmanager timers. Pass-through results are never delayed: each survivor flows to the sinks immediately, annotated with its incident_id in enrichments. The incident itself is emitted by a background tick.
Two modes:
group_by(default): group by equality on an ordered selector list. Theincident_idis a deterministic fingerprint of the(selector -> value)pairs, so the same logical incident keeps one id across restarts and re-emissions. The rule identity is deliberately excluded from the key, so an incident can span rules that share the group value.entity_graph(opt-in): union-find over(selector, value)entity pairs. A result joins (and merges) any open incident that shares an entity value, and theincident_idis a surrogate UUID. This is powerful but prone to the giant-component failure, where one common value (a domain controller, a shared service account, an egress NAT IP) transitively merges unrelated incidents. It ships with mandatory guards: astop_valueslist of non-joinable values and a per-valuemax_value_cardinalityceiling above which a value stops acting as a join key (both surfaced onrsigma_incident_overmerge_total).
Timers🔗
group_wait: initial batching delay before the first incident emission, so a burst lands as one incident.group_interval: minimum delay before emitting an updated incident after new results join.repeat_interval: re-emit cadence for a still-open incident (0disables re-emits).resolve_timeout: idle timeout after which the incident emits a finalresolvedrecord and is evicted.
Configuration🔗
group:
mode: group_by # or entity_graph
by: # group_by mode: the group key selectors
- match.SourceIp
entities: # entity_graph mode: the join-edge selectors
- match.SourceIp
- match.User
group_wait: 30s
group_interval: 5m
repeat_interval: 0
resolve_timeout: 1h
include: refs # refs | results
caps:
max_open_incidents: 10000
max_entities_per_incident: 1000
max_results_per_incident: 1000
max_value_cardinality: 10000
stop_values: ["0.0.0.0", "-", ""] # entity_graph
nats_subject: rsigma.incidents # optional, route incidents to a dedicated NATS subject
include: refs (default) embeds lightweight {rule, level} references; include: results embeds the full (event-stripped) contributing results, bounded by max_results_per_incident.
Wire shape🔗
An IncidentResult is one flat NDJSON object, disambiguated downstream by the presence of an incident_id key. It carries the state (open / resolved), the trigger (group_wait / group_interval / repeat / resolved), the window bounds, the max_level, the result_count, per-rule rule_counts, the group_by key (group_by mode) or entities map (entity_graph mode), and the refs or results. Incidents are delivered to stdout/file/NATS sinks; with nats_subject set, incidents publish to that dedicated subject instead of the detection stream. OTLP and webhook sinks do not receive incidents.
Open incidents are also readable at GET /api/v1/incidents.
Metrics🔗
rsigma_incidents_open— open incidents currently tracked.rsigma_incidents_emitted_total{trigger}— incident emissions by trigger.rsigma_incident_results_total— total incident records emitted.rsigma_incident_overmerge_total{guard}— entity-graph guard hits (stop_value/cardinality_ceiling).