Event Management ?& ITIL V3
Service Operation Processes
The process that monitors all events that occur through the IT infrastructure to allow for normal service operation and to detect and escalate exceptions.
Effective service support relies on know the status of the infrastructure and detecting any deviation from normal or expected operation.
This can be provided by good monitoring and control systems, which are based on two types of tools:
Active monitoring tools that monitor key CI’s to determine their status and availability.
Passive monitoring tools that detect and correlate operational alerts or communications generated by CI’s.
Event Management can be applied to any aspect of Service
Management that needs to be controlled and can be
Software license monitoring
Different types of event
There are different types of event:
Events that signify regular operation
Notifications that a scheduled workload has completed
A user has logged in to use an application
An email has reached its intended recipient
Events that signify an exception
User attempts to log on to an application with the incorrect password
Unusual situation has occurred in a business process that may indicate an exception requiring further business investigation
Device’s CPU is above the acceptable utilization rate
Events that signify unusual, but not exceptional, operation
Server’s memory utilization reaches within 5% of its highest acceptable performance level
Completion time of a transaction is 100% longer than normal
Event Management – Activities
Events occur continuously, but not all of them are detected or
registered. It is therefore important that everybody involved in
designing, developing, managing and supporting IT services
and the IT Infrastructure that they run on understands what
types of event need to be detected.
Most CI’s are designed to communicate certain information
about themselves on one of two ways:
A device is interrogated by a management tool, which collects certain targeted data. Often referred to as ‘polling’.
The CI generates a notification when certain conditions are met. The ability to produce these notifications has to be designed and built into the CI.
Alert / Event detection?
Once an Event notification has been generated, it will be
detected by an agent running on the same system, or
transmitted directly to a management tool specifically designed
to read and interpret the meaning of the event.
The purpose of filtering is to decide what is the best course of
action to take e.g.
Communicate the event to a management tool
Ignore it, if this is the case the event will need to be recorded.
Events need to be filtered as it is not always possible to turn
event notifications off. During the filtering activity, the first level
of correlation is performed
Significance of events
Every organization will have its own method and criteria for
categorizing the significance of an event, the following are
three broad category suggestions:
Correlation is normally done by a ‘Correlation Engine’, part of a
management tool that compares the event with a set of criteria
and relies in prescribed order.
These criteria are often referred to as Business Rules, but are
generally quite technical.
The idea is that the event may represent some impact on the
business and the rules can be used to determine the level and
type of business impact.
If the correlation activity recognizes an event, a response will
be required. The mechanism used to initiate that response is
called a trigger. There are many different types of triggers,
each designed specifically for the task it has to initiate. E.g.
A trigger resulting from an approved RFC that has been implemented but caused the event or an unauthorized change that has been detected.
Paging systems that will notify a person of the event by mobile phone .
Database triggers that restrict access of a user to specific records.
At this point in the process, there are a number of response
options available. Difference organizations will have different
There will be a range of responses for each different
With so many events occurring on a daily basis, it is not
possible to review each one individually. However, it is
essential to check any significant events or exceptions have
been handled correctly, track trends etc In most cases this
can be done automatically.
Some events remain open until specific actions take place e.g.
an event that is linked to an open incident. However, most
events are not opened or closed.
Informational events are logged
Auto-response events will typically be closed by the generation of a second event.
In the case of events that have generated an incident, problem or change, these will be formerly closed with a link to the appropriate record from the other processes.
SNMP messages, which are a standard way of communicating technical information about the status of components of an IT infrastructure.
Management Information Bases of IT devices
Vendor’s monitoring tools agent software
Value to the business
The value to the business of implementing the Event Management
process is generally indirect, but it is possible to determine the basis
for its value. E.g.
It provides mechanisms for early detection of incidents.
It enables some types of automation activity to be monitored by exception.
Signal status changes or exceptions that allow the appropriate person or team to perform early response.
Provides a basis for automated operations, thus increasing efficiency and allowing human resources to be better utilized.
# of events by category and significance
# and % of events that required human intervention and whether this was performed
# and % of events that resulted in incidents and changes
# and % of events caused by existing problems and known errors.
# and % of repeated or duplicated events.
# and % of events indicating performance issues and potential availability issues.
# and % of each type of event per platform or application
# and ratio of events compared with the number of incidents.
Initial challenge in obtaining funding for necessary tools and effort required.
Setting the correct level of filtering.
Rolling out necessary monitoring agents across the entire IT infrastructure can be difficult and time consuming – required ongoing commitment.
Acquiring the necessary skills.
In order to obtain the necessary funding a compelling Business
Case should be prepared showing how the benefits of effective
Event Management can far outweigh the costs – giving a
positive return on investment.
One of the most important CSF’s is achieving the correct level
of filtering. This is complicated by the fact that the significance
of events changes. E.g. a user logging into a system today is
normal, but if that user leaves the organization and tries to log
in is it a security breach.
Failure to obtain adequate funding
Ensuring the correct level of filtering
Failure to maintain momentum in rolling out the necessary monitoring agents across the IT infrastructure.