Availability Management Expanded Incident Lifecycle

An aim of Availability Management is to ensure the duration and impact from Incidents impacting IT Services are minimized, to enable business operations to resume as quickly as possible.

The expanded Incident lifecycle enables the total IT Service downtime for any given Incident to be broken down and mapped against the major stages that all Incidents go through.

Mean time between Failures (MTBF) or uptime:
* Average time between the recovery from one incident and the occurrence of the next incident, relates to the reliability of the service.

Mean time to Restore Service (MTRS) or downtime:
* Average time taken to restore a CI or IT service after a failure
* Measured from when CI or IT service fails until it is fully restored and delivering its normal functionality.

Mean time between System Incidents (MTBSI):
* Average time between the occurrences of two consecutive incidents
* Sum of the MTRS and MTBF.

Relationships of the above terms:
* High ratio of MTBF/MTBSI indicates there are frequently occurring minor faults or disruptions
* Low ratio of MTBF/MTBSI indicates there are infrequently occurring major faults or disruptions.

Elements making up the Mean Time to Restore Service (MTRS):

* Detection Time: Time for the service provider to be informed of the fault (reported)
* Diagnosis Time: Time for the service provider to respond after diagnosis completed
* Repair Time:
o Time the service provider restores the components that caused the fault
o Calculated from diagnosis to recovery time.
* Restoration Time(MTRS):
o The agreed level of service is restored to the user
o Calculated from detection to restore point.
* Restore Point: The point where the agreed level of service has been restored.