METRICS · OPERATIONS & RESOURCE

Mean Recovery Time (MTTR)

VC-MET-302 — the mean elapsed time from incident detection to verified resolution across closed dispatch tickets, the headline measure of facility response effectiveness.

Metric ID VC-MET-302 Category Operations & Resource Unit min Type Derived Source CHLORA Sampling 1 h Owner Metrology & Telemetry Standards Rev C · Effective 2089-06-01

Current 27 min Nominal ≤ 35 min 1-h rolling window Derived by CHLORA

Definition

Mean Recovery Time (MTTR) is the arithmetic mean, in minutes, of the elapsed time from incident detection to verified resolution, computed over all dispatch tickets that reached a RESOLVED state within the trailing window. The clock starts at the detection timestamp (the metric breach or fault event, not ticket creation) and stops at operator-verified resolution, not at the provisional auto-clear. Cancelled and duplicate tickets are excluded.

Why it matters

MTTR is the truest measure of how quickly the facility returns from an off-nominal state to health. Queue depth tells you how much is open; MTTR tells you how fast it gets fixed. A rising MTTR with stable queue depth means individual incidents are getting harder — parts shortages, skill gaps, or recurrent root causes — while a low, stable MTTR underpins every recovery-time service commitment the Directorate holds. CHLORA publishes it on the Operations Dashboard and uses it to tune crew dispatch priority.

Formula

MTTR is the mean of per-ticket recovery durations over the trailing window:

MTTR = ( 1 / N ) · Σ ( t_resolved,i − t_detected,i )
        over tickets i resolved in trailing 60 min, in minutes

where:
  t_detected  = timestamp of the originating breach/fault event
  t_resolved  = timestamp of operator-verified resolution
  N           = count of qualifying resolved tickets in window

Exclusions:
  - CANCELLED and de-duplicated child tickets
  - tickets resolved < 30 s (auto-clear noise)

Segmentation (published alongside, not in headline):
  MTTR_P1, MTTR_P2, MTTR_P3, MTTR_P4  by severity band
  P95 recovery time reported to expose long-tail incidents.

If N < 5 in window, MTTR carries a LOW_SAMPLE confidence flag.

Inputs

Channel	Role	Cadence	Reference	Source
Detection timestamps	Start of recovery clock	Event-driven	Breach / fault epoch	CHLORA
Resolution timestamps	Stop of recovery clock	Event-driven	Operator-verified	CHLORA
Severity tags	Per-ticket P1–P4 segmentation	Event-driven	Triage label	CHLORA
Dispatch queue (VC-MET-301)	Population of resolved tickets	1 min	Closed-ticket set	CHLORA

Units & Scale

MTTR is reported in minutes to whole-minute precision with a P95 long-tail figure alongside. It is a window mean, not additive across zones; a facility MTTR weighted by ticket count is published rather than a simple zone average. The trailing window is 60 minutes, re-evaluated hourly. A LOW_SAMPLE confidence flag is attached when fewer than five qualifying tickets fall in the window, since small N makes the mean volatile.

Sampling & Source

Recomputed every 1 hour by CHLORA over a trailing 60-minute window of resolved tickets.
Detection and resolution timestamps both sourced from the CHLORA dispatch ledger.
Recovery clock runs from breach detection to operator-verified resolution, not ticket creation to auto-clear.
Stale / low-sample handling: window with N < 5 tickets → value held with LOW_SAMPLE; no resolutions → last value carried forward, marked STALE.

Thresholds

Nominal · ≤ 35 min

OK

Incidents cleared within service commitment.

Warning · > 60 min

WARN

Recovery slowing; root-cause review for the window.

Critical · > 120 min

CRIT

Sustained slow recovery; process-failure escalation.

Hysteresis & sampling caution. MTTR is an upper-bound metric and alarms only on slow recovery. To avoid single-incident whiplash, a WARN requires two consecutive hourly windows above 60 min before it fires, and clears after one window at ≤ 45 min. The P95 long-tail figure governs CRIT independently: a single > 4 h incident escalates regardless of the mean.

Recent Trend

Facility MTTR in minutes, last 14 hourly windows:

MTTR · 14-window trend, minutes (current 27)

Interpretation Guidance

MTTR Band	Reading	Likely Driver	Action
≤ 20 min	Fast recovery	Simple incidents, crews available	None; log window as reference.
21–35 min	Nominal	Mixed incident complexity	None; normal operating band.
36–60 min	Slowing	Harder incidents or crew contention	Check parts availability and crew load.
61–120 min	WARN slow	Recurrent root cause or backlog	Window root-cause review; rebalance crews.
> 120 min	CRIT stalled	Process failure or long-tail incident	Process-failure escalation; invoke recovery SOP.

VC-MET-301

Related SOPs

SOP Library — incident response and root-cause-review procedures.
Window root-cause review on WARN/CRIT MTTR — see the SOP Library.
Recovery-time monitoring & long-tail escalation — Monitoring Systems.

Mean Recovery Time (MTTR)

Definition

Why it matters

Formula

Inputs

Units & Scale

Sampling & Source

Thresholds

OK

WARN

CRIT

Recent Trend

Interpretation Guidance

Dispatch Queue Depth

Alarm Acknowledgement Latency

Pollinator Drone Charge

Substrate Stock Coverage

Canopy Vitality Index

Containment Integrity Index

Related SOPs

Mean Recovery Time (MTTR)

Definition

Why it matters

Formula

Inputs

Units & Scale

Sampling & Source

Thresholds

OK

WARN

CRIT

Recent Trend

Interpretation Guidance

Related Metrics

Dispatch Queue Depth

Alarm Acknowledgement Latency

Pollinator Drone Charge

Substrate Stock Coverage

Canopy Vitality Index

Containment Integrity Index

Related SOPs