Difference between revisions of "UM:Event Processing"

From NetXMS Wiki
Jump to navigation Jump to search
(→‎Alarms Overview: Status icons & two states added)
(→‎Alarms Overview: State change diagram.)
Line 112: Line 112:


|}
|}
User can change alarm state according to the following diagram:
[[File:AlarmStatesTransitionsInvokedByUser.png]]


=== Generating Alarms ===
=== Generating Alarms ===

Revision as of 17:31, 19 December 2012

Event processing is one of the core components of NetXMS. It determines how the monitoring system will react to various events.

Event Processing Overview

The following flowchart outlines event flow inside the monitoring system:

Error creating thumbnail: Unable to save thumbnail to destination
Event flow inside the monitoring system

As you can see on the flowchart, events can come from various sources: polling processes (status, configuration, discovery, and data collection), SNMP traps, and directly from external applications via client library. All incoming events go to single event queue for processing. A special process, called Event Processor, takes events from the queue one by one and matches them against Event Processing Policy. As a result, alarms may be generated and actions may be executed. If the event has write to log attribute set, it is written to NetXMS event log at the end of processing.

Although it may seem that processing all events one by one may become a bottleneck in the system, this should not be the case. Event processor is highly optimized, and all potentially long operations (like action execution) are performed by separate processes.

Event Processing Policy

Actions taken by event processor for any specific event determined by set of rules called Event Processing Policy.

Error creating thumbnail: Unable to save thumbnail to destination
Event Processing Policy Screen


Every rule has two parts - a matching part (called Condition in the rule configuration dialog), which determines if the rule is appropriate for the current event, and an action part, which determines actions to be taken for matched events. A matching part consists of four fields:

Attribute Description
Source Objects One or more event's source objects. This list can be left empty, which matches any node, or contain nodes, subnets, or containers. If you specify subnet or container, any node within it will be matched.
Events Event code. This field can be left empty, which matches any event, or list of event codes.
Severity Filter Event's severity. This field contains selection of event severities to be matched.
Filtering Script Optional matching script written in NXSL. If this field is empty, no additional checks performed. Otherwise, the event will be considered as matched only if the script returns non-zero (TRUE) return code. For more information about NetXMS scripting language please refer to the chapter NetXMS Scripting Language (NXSL) in this manual.

In action part you can set alarm generation, situation update, and list of actions to be executed. Every rule can also have a free-form textual comment.

Each event passes through all rules in the policy, so if it matches to more than one rule, actions specified in all matched rules will be executed. You can change this behavior by setting Stop Processing flag for the rule. If this flag is set and rule matched, processing of current event will be stopped.

You can create and modify Event Processing Policy using Event Processing Policy Editor. To access the Event Processing Policy Editor window, press F9 or on the View menu click Control Panel to access the Control Panel window and then click the Event Processing Policy icon.

Examples

Error creating thumbnail: Unable to save thumbnail to destination
Rule Configuration Example 1


This rule defines that for every major or critical event originated from a node named "IPSO" two e-mail actions must be executed.

Error creating thumbnail: Unable to save thumbnail to destination


This rule defines that for events NOKIA_CFG_CHANGED, NOKIA_CFG_SAVED, NOKIA_LOW_DISK_SPACE, and NOKIA_NO_DISK_SPACE, originated from any node, system should generate alarm with text "%m" (which means "use event's message text) and severity equal to event's severity.

Alarms

Alarms Overview

As a result of event processing some events can be shown up as alarms. Usually alarm represents something that needs attention of network administrators or network control center operators, for example low free disk space on a server. Every alarm has the following attributes:

Attribute Description
Creation time Time when alarm was created.
Last change time Time when alarm was last changed (for example, acknowledged).
State Alarm can be in one of the following states:
Error creating thumbnail: Unable to save thumbnail to destination
Outstanding New alarm;
Error creating thumbnail: Unable to save thumbnail to destination
Acknowledged When network administrator sees an alarm, he may acknowledge it to indicate that somebody already aware of that problem and working on it. A new event with the same alarm ID will reset the alarm state back to acknowledge;
Error creating thumbnail: Unable to save thumbnail to destination
Sticky Acknowledged Alarm will remain acknowledged event after new matching events. This can be useful when you know that there will be new matching events, but it will not change the situation. For example, if you have network device which will send new SNMP trap every minute until problem solved, sticky acknowledge will help to eliminate unnecessary outstanding alarms.
Error creating thumbnail: Unable to save thumbnail to destination
Resolved Network administrator sets this state when the problem is solved.
Error creating thumbnail: Unable to save thumbnail to destination
Terminated Inactive alarm. When problem is solved, network administrator can terminate alarm. This will remove alarm from active alarms list and it will not be seen in console, but alarm record will remain in database.
Message Message text (usually derived from originating event's message text).
Severity Alarm's severity - Normal, Warning, Minor, Major, or Critical.
Source Source node (derived from originating event).
Key Text string used to identify duplicate alarms and for automatic alarm termination.

User can change alarm state according to the following diagram:

Error creating thumbnail: Unable to save thumbnail to destination

Generating Alarms

To generate alarms from events, you should edit "Alarm" field in appropriate rule of Event Processing Policy. Alarm configuration dialog will look like this:

Figure 10: Alarm configuration dialog

Error creating thumbnail: Unable to save thumbnail to destination


You should select Generate new alarm radio button to enable alarm generation from current rule. In the Message field enter alarm's text, and in the alarm key enter value which will be used for repeated alarms detection and automatic alarm termination. In both fields you can use macros described in the Macros for Event Processing section.

You can also configure sending of additional event if alarm will stay in Outstanding state for given period of time. To enable this, enter desired number of seconds in Seconds field, and select event to be sent. Entering value of 0 for seconds will disable additional event sending.

Automatic Alarm Termination

You can terminate all active alarms with given key as a reaction for the event. To do this, select Terminate alarm radio button in alarm configuration dialog and enter value for alarm key. For that field you can use macros described in the Macros for Event Processing chapter.

Situations

Situations Overview

Situations is a special type of event processing objects which allow you to track current state of your infrastructure and process events accordingly. Each situation has one or more instances, and each instance has one or more attributes. Situation objects allow you to store information about current situation in attributes and then use this information in event processing. For example, if you have one service (service A) depending on another (service B), and in case of service B failure you wish to get alarm about service B failure, and not about consequent service A failure. To accomplish this, you can do the following:

  1. Create situation object named "ServiceStatus";
  2. In event processing policy, for processing of event indicating service B failure, add situation attribute update: update situation "ServiceStatus", instance "Service_B", set attribute "status" to "failed";
  3. In event processing policy, for rule generating alarm in case of service A failure, add additional filtering using script - to match this rule only if service B is not failed. Your script may look like the following:
sub main()
{
    s = FindSituation("ServiceStatus", "Service_B");
    if (s != NULL)
    {
        if (s->status == "failed")
            return 0; // Don't match rule
    }
    return 1; // Match rule
}


Defining Situations

Situations can be configured via management console. To open situations editor, select View in main menu, then Situations. You will see situations tree. At the top of the tree is an abstract root element. Below are all defined situations – initially there are no situations, so you will see only root element. You can create situation either by right-clicking root element and selecting Create from pop-up menu, or by selecting Create under Situation in main menu.

Next level in the tree below situations is situation instances. Initially it is empty, but when situations start updating, you will see existing instances for each situation.

Updating Situations

Situations can be updated via Event Processing Policy. To update situation, you can edit Situation field in appropriate rule. Situation update dialog will looks like following:


Figure 11: Situation update dialog

Error creating thumbnail: Unable to save thumbnail to destination

You can select situation to update, and enter instance name and attributes to be set. In instance name and attributes' values you can use same macros as in alarm generation.

Macros for Event Processing

On various stages of event processing you may need to use macros to include information like event source, severity, or parameter in your event texts, alarms, or actions. You may use the following macros to accomplish this:

Macro Description
%n Name of event source object.
%a IP address of event source object.
%g Globally unique identifier (GUID) of event source object.
%i Unique ID of event source object in hexadecimal form. Always prefixed with 0x and contains exactly 8 digits (for example 0x000029AC).
%I Unique ID of event source object in decimal form.
%t Event's timestamp is a form day-month-year hour:minute:second.
%T Event's timestamp as a number of seconds since epoch (as returned by time() function).
%c Event's code.
%N Event's name.
%s Event's severity code as number. Possible values are:
0 Normal
1 Warning
2 Minor
3 Major
4 Critical
%S Event's severity code as text.
%v NetXMS server's version.
%u User tag associated with the event.
%m Event's message text (meaningless in event template).
%A Alarm's text (can be used only in actions to put text of alarm from the same event processing policy rule).
%M Custom message text. Can be set in filtering script by setting CUSTOM_MESSAGE variable.
%[name] Value returned by script. You should specify name of the script from script library.
%{name} Value of custom attribute.
%<name> Event's parameter with given name.
%1 - %99 Event's parameter number 1 .. 99.
%% Insert % character.

If you need to insert special characters (like carriage return) you can use the following notations:

Char Description
\t Tab character
\n CR/LF character pair
\\ Backslash character