GREENWORLD SECTOR-7 // METRICS // VC-MET-302
ALL SYSTEMS NOMINAL ·
METRICS · OPERATIONS & RESOURCE

Mean Recovery Time (MTTR)

VC-MET-302 — the mean elapsed time from incident detection to verified resolution across closed dispatch tickets, the headline measure of facility response effectiveness.

Metric ID VC-MET-302 Category Operations & Resource Unit min Type Derived Source CHLORA Sampling 1 h Owner Metrology & Telemetry Standards Rev C · Effective 2089-06-01
Current 27 min Nominal ≤ 35 min 1-h rolling window Derived by CHLORA

Definition

Mean Recovery Time (MTTR) is the arithmetic mean, in minutes, of the elapsed time from incident detection to verified resolution, computed over all dispatch tickets that reached a RESOLVED state within the trailing window. The clock starts at the detection timestamp (the metric breach or fault event, not ticket creation) and stops at operator-verified resolution, not at the provisional auto-clear. Cancelled and duplicate tickets are excluded.

Why it matters

MTTR is the truest measure of how quickly the facility returns from an off-nominal state to health. Queue depth tells you how much is open; MTTR tells you how fast it gets fixed. A rising MTTR with stable queue depth means individual incidents are getting harder — parts shortages, skill gaps, or recurrent root causes — while a low, stable MTTR underpins every recovery-time service commitment the Directorate holds. CHLORA publishes it on the Operations Dashboard and uses it to tune crew dispatch priority.

Formula

MTTR is the mean of per-ticket recovery durations over the trailing window:

MTTR = ( 1 / N ) · Σ ( t_resolved,i − t_detected,i )
        over tickets i resolved in trailing 60 min, in minutes

where:
  t_detected  = timestamp of the originating breach/fault event
  t_resolved  = timestamp of operator-verified resolution
  N           = count of qualifying resolved tickets in window

Exclusions:
  - CANCELLED and de-duplicated child tickets
  - tickets resolved < 30 s (auto-clear noise)

Segmentation (published alongside, not in headline):
  MTTR_P1, MTTR_P2, MTTR_P3, MTTR_P4  by severity band
  P95 recovery time reported to expose long-tail incidents.

If N < 5 in window, MTTR carries a LOW_SAMPLE confidence flag.

Inputs

ChannelRoleCadenceReferenceSource
Detection timestampsStart of recovery clockEvent-drivenBreach / fault epochCHLORA
Resolution timestampsStop of recovery clockEvent-drivenOperator-verifiedCHLORA
Severity tagsPer-ticket P1–P4 segmentationEvent-drivenTriage labelCHLORA
Dispatch queue (VC-MET-301)Population of resolved tickets1 minClosed-ticket setCHLORA

Units & Scale

MTTR is reported in minutes to whole-minute precision with a P95 long-tail figure alongside. It is a window mean, not additive across zones; a facility MTTR weighted by ticket count is published rather than a simple zone average. The trailing window is 60 minutes, re-evaluated hourly. A LOW_SAMPLE confidence flag is attached when fewer than five qualifying tickets fall in the window, since small N makes the mean volatile.

Sampling & Source

Thresholds

Nominal · ≤ 35 min

OK

Incidents cleared within service commitment.

Warning · > 60 min

WARN

Recovery slowing; root-cause review for the window.

Critical · > 120 min

CRIT

Sustained slow recovery; process-failure escalation.

Hysteresis & sampling caution. MTTR is an upper-bound metric and alarms only on slow recovery. To avoid single-incident whiplash, a WARN requires two consecutive hourly windows above 60 min before it fires, and clears after one window at ≤ 45 min. The P95 long-tail figure governs CRIT independently: a single > 4 h incident escalates regardless of the mean.

Recent Trend

Facility MTTR in minutes, last 14 hourly windows:

MTTR · 14-window trend, minutes (current 27)

Interpretation Guidance

MTTR BandReadingLikely DriverAction
≤ 20 minFast recoverySimple incidents, crews availableNone; log window as reference.
21–35 minNominalMixed incident complexityNone; normal operating band.
36–60 minSlowingHarder incidents or crew contentionCheck parts availability and crew load.
61–120 minWARN slowRecurrent root cause or backlogWindow root-cause review; rebalance crews.
> 120 minCRIT stalledProcess failure or long-tail incidentProcess-failure escalation; invoke recovery SOP.

Related SOPs