Problem Management Scope

Clear distinction should be made between the purpose, scope and activities of Problem Management and those of Incident Management. In many cases, staff may not clearly understand the distinction, and as a result not utilize their efforts in the most effective and efficient manner.

For most implementations of Problem Management the scope includes:
* The activities required to diagnose the root cause of incidents and to determine the resolution to those problems
* Activities that ensure that the resolution is implemented through the appropriate control procedures, usually through interfaces with Change Management and Release and Deployment Management
* Proactive activities that eliminate errors in the infrastructure before they result in incidents and impact on the business and end users.

Defined as two major processes:
* Reactive Problem Management
* Proactive Problem Management **

** Initiated in Service Operation but generally driven as part of Continual Service Improvement.

Remember the weeding analogy used for Incident Management? Problem Management seeks to identify and remove the root-cause of Incidents in the IT Infrastructure.

Terminology
Explanations

Problem:
Unknown underlying cause of one or more Incidents
(The investigation)

Known Error:
Known underlying cause.
Successful diagnosis of the root cause of a Problem, and workaround or permanent solution has been identified

KEDB:
Known Error Database, where Known Errors and their documented workarounds are maintained. This database is owned by Problem Management.

Workaround:
The pre-defined and documented technique used to restore normal service operation for the user. A workaround is NOT a permanent (structural) solution, and only addresses the symptoms of errors. These workarounds are stored in the KEDB (or Service Knowledge Management System).

Relationships between incidents, problems and known errors

Problems are identified and corrected in multiple ways. For most organizations, the primary benefit of Problem Management is demonstrated in the “Many to One” relationship between Incidents and Problems. This enables an IT Service Provider to resolve many Incidents in an efficient manner by correcting the underlying root-cause. Change Management is still required so that the actions being performed to correct and remove the error are done so in a controlled and efficient manner.

Why do some Problems not get diagnosed?
* Because the root cause is not always found.

Why do some Known Errors not get fixed?
* Because we may decide that the costs exceed the benefits of fixing the error; or
* Because it may be fixed in an upcoming patch from development teams or suppliers.