`rsigma rule scorecard`🔗

Fuse the detection-as-code rule-side outputs into the per-rule keep/tune/retire verdict table a mature detection program reviews on a cadence.

Synopsis🔗

rsigma rule scorecard --backtest <FILE> --coverage <FILE> [OPTIONS]

Description🔗

rule scorecard reads JSON the toolkit already emits and turns it into a decision. The rule backtest report answers "do my rules fire correctly" (and surfaces unexpected fires on a benign corpus as the false-positive signal); the rule coverage report answers "what does my rule set cover"; the per-rule Prometheus counters answer "how often does each rule fire in production." Each is a raw output. The scorecard is the fusion-and-verdict layer that joins them into the single artifact a program acts on: for every rule, a precision proxy, volume, ATT&CK context, and a keep/tune/retire verdict with a reason.

It runs no evaluation and re-reads no corpus. It fuses already-aggregated reports, so it is an offline rule-group command with no engine or hot-path involvement.

The two JSON reports are required. The Prometheus snapshot and the triage feed are optional enrichers: a scorecard run with only the two required inputs still produces a meaningful corpus-derived table, and a missing optional input degrades the verdict rather than blocking it.

Inputs🔗

Input	Flag	Required	What it supplies
Backtest report	`--backtest <FILE>`	yes	Precision proxy and recall, per-rule corpus fire counts, level, logsource. The unexpected-fire rollup is the corpus false-positive signal.
Coverage report	`--coverage <FILE>`	yes	Per-rule ATT&CK technique and tactic mapping, plus the per-technique rule count used for sole-coverage analysis.
Prometheus snapshot or endpoint	`--metrics <FILE\\|URL>`	no	Production true-positive volume from `rsigma_detection_matches_by_rule_total` and `rsigma_correlation_matches_by_rule_total`, joined by `rule_title`.
Prometheus query API	`--metrics-window <DURATION>`	no	When `--metrics` is a Prometheus query-API base, switches to a `query_range` over the window to derive last-fired and the current value.
Triage disposition feed	`--triage <FILE>`	no	Live per-rule false-positive ratio and, where present, MTTD/MTTR.

Both JSON reports are owned by rsigma, so the scorecard shares the producer structs and version-checks any on-disk report against the shipped release: a report from an incompatible build fails the typed deserialize and exits 3.

The Prometheus join inherits the caveat documented in Metrics: rule_title is not guaranteed unique, so when two rules share a title their counters add together. The scorecard keys its records by rule_id, sums the colliding title for the volume column, and flags the affected records with title_collision. For collision-free production volume, scrape rsigma_detection_matches_total and join your detection NDJSON by rule_id outside Prometheus.

Flags🔗

Flag	Default	Description
`--backtest <FILE>`	required	The backtest JSON report (from `rule backtest --report`). May also be supplied via `scorecard.backtest`.
`--coverage <FILE>`	required	The coverage JSON report (from `rule coverage --output-format json`). May also be supplied via `scorecard.coverage`.
`--metrics <FILE\\|URL>`	unset	A Prometheus exposition snapshot file or a `/metrics` URL. May also be supplied via `scorecard.metrics`.
`--metrics-window <DURATION>`	unset	Range-query window (e.g. `7d`, `24h`) when `--metrics` is a query-API base. May also be supplied via `scorecard.metrics_window`.
`--triage <FILE>`	unset	The triage disposition feed. May also be supplied via `scorecard.triage`.
`--fail-on <POLICY>`	`none`	Exit `1` when any rule's verdict is at or worse than the policy: `retire` fails on retire; `tune` fails on tune and retire; `none` reports only. May also be supplied via `scorecard.fail_on`.
`--report <PATH>`	unset	Write the program artifact; `.md`/`.markdown` to markdown, `.html`/`.htm` to HTML. May also be supplied via `scorecard.report`.
`--report-format <FMT>`	from extension	Override the `--report` format (`markdown` or `html`).
`--min-precision <F>`	`0.80`	Keep floor: precision proxy at or above this keeps the rule.
`--tune-max-precision <F>`	`0.50`	Upper edge of the review band (used in the tune reason).
`--retire-max-precision <F>`	`0.10`	Retire floor: precision proxy below this retires the rule.
`--min-volume <N>`	`1`	Minimum total volume for a keep verdict.
`--stale-window <DAYS>`	`30`	Staleness window; a rule that has not fired within it is not kept (enforced only when last-fired is known via `--metrics-window`).
`--max-fp-ratio <F>`	`0.50`	Live false-positive-ratio ceiling; a rule above it is at best tuned.
`--config <PATH>`	unset	Load a specific YAML config file instead of running the discovery chain.
`--dry-run`	off	Print the effective `scorecard` section and exit `0` without running.

The global --output-format applies: table (the TTY default) renders the human scorecard grouped by verdict under a summary header, json emits the single scorecard document, and ndjson/csv/tsv emit one row per rule.

Verdict model🔗

Each fused per-rule record carries a precision proxy, recall, the corpus false-positive signal, true-positive volume (corpus plus production where --metrics is given), last-fired and latency where available, ATT&CK context with a sole-coverage flag, the live false-positive ratio, a keep/tune/retire verdict, and a reason. Every cell records which input supplied it under provenance, so corpus-derived numbers are distinguishable from production-derived ones.

The bands default to the SOC quality-metrics thresholds and are fully configurable:

retire: precision proxy below the retire floor (0.10), or zero volume across both the corpus and the metrics window (a dead rule).
tune: precision proxy in the review band (at or above the retire floor and below the keep floor), or a live false-positive ratio above the ceiling (0.50). A retire candidate that is the sole coverage for an ATT&CK technique is downgraded here with a coverage-risk note, so the program never silently drops coverage.
keep: precision proxy at or above the keep floor (0.80), with enough volume and a recent enough last-fired.

Missing optional inputs degrade the verdict rather than blocking it: with no --metrics, volume and staleness come from the corpus alone; with no --triage, the live false-positive ratio and latency are blank and the verdict falls back to the corpus precision proxy.

Triage feed🔗

The triage feed is the live disposition signal. It is a JSON document keyed by rule_id:

{
  "rules": [
    { "rule_id": "5f0d7d3c-...", "true_positives": 8, "false_positives": 2, "mttd_seconds": 1800, "mttr_seconds": 3600 },
    { "rule_id": "a2b1c0d9-...", "fp_ratio": 0.4 }
  ]
}

Each entry supplies either an explicit fp_ratio or the true_positives/false_positives counts to derive it, plus optional mttd_seconds/mttr_seconds.

Report🔗

The JSON document (--output-format json) has a stable shape:

summary: rule count, keep/tune/retire counts, portfolio precision proxy, ATT&CK technique count and tagged percentage, and the effective thresholds.
records[]: per rule, the id, title, level, logsource, precision proxy, recall, false-positive signal, corpus and production volume, last-fired and latency where available, ATT&CK context with the sole-coverage flag, the live false-positive ratio, the title-collision flag, the verdict and its reason, and per-cell provenance.
inputs: which optional inputs contributed.

--report writes the same content as a standalone markdown or HTML artifact grouped by verdict, for sharing in a review.

Exit codes🔗

Code	Meaning
`0`	Success, or verdicts were produced but none tripped `--fail-on`.
`1`	`--fail-on` was set and at least one rule's verdict is at or worse than the policy.
`2`	A required or optional input is missing or unfetchable.
`3`	A bad flag, or a malformed or version-mismatched report.

Examples🔗

Produce the corpus-derived scorecard from the two required reports🔗

rsigma rule scorecard --backtest backtest.json --coverage coverage.json

Enrich with production volume and analyst dispositions🔗

rsigma rule scorecard --backtest backtest.json --coverage coverage.json \
    --metrics http://localhost:9090/metrics --triage triage.json

Write the weekly review artifact🔗

rsigma rule scorecard --backtest backtest.json --coverage coverage.json \
    --report scorecard.md

Gate CI on retire-grade rules🔗

rsigma rule scorecard --backtest backtest.json --coverage coverage.json \
    --fail-on retire

rsigma rule scorecard🔗