Difference between revisions of "UM:Event Processing"

From NetXMS Wiki
Jump to navigation Jump to search
(Replaced content with "Information moved to documentation: https://www.netxms.org/documentation/adminguide")
Line 1: Line 1:
{{DISPLAYTITLE:Event Processing}}
Information moved to documentation:
Event processing is one of the core components of NetXMS. It determines how the monitoring system will react to various events.


== Event Processing Overview ==
https://www.netxms.org/documentation/adminguide
The following flowchart outlines event flow inside the monitoring system:
 
[[File:Event_flow.png|thumb|500px|none|alt=Event flow|Event flow inside the monitoring system]]
 
As you can see on the flowchart, events can come from various sources: polling processes (status, configuration, discovery, and data collection), SNMP traps, and directly from external applications via client library. All incoming events go to single event queue for processing. A special process, called ''Event Processor'', takes events from the queue one by one and matches them against Event Processing Policy. As a result, alarms may be generated and actions may be executed. If the event has ''write to log'' attribute set, it is written to NetXMS event log at the end of processing.
 
Although it may seem that processing all events one by one may become a bottleneck in the system, this should not be the case. Event processor is highly optimized, and all potentially long operations (like action execution) are performed by separate processes.
 
== Event Processing Policy ==
Actions taken by event processor for any specific event determined by set of rules called ''Event Processing Policy''.
 
[[File:Event_Processing_Policy.png|frame||left|Event Processing Policy Screen]]<br style="clear: both" />
 
Every rule has two parts - a matching part (called '''Condition''' in the rule configuration dialog), which determines if the rule is appropriate for the current event, and an action part, which determines actions to be taken for matched events. A matching part consists of four fields:
 
{| class="wikitable" style="width: 70%"
! Attribute || Description
|-
| ''Source Objects''
| One or more event's source objects. This list can be left empty, which matches any node, or contain nodes, subnets, or containers. If you specify subnet or container, any node within it will be matched.
 
|-
| ''Events''
| Event code. This field can be left empty, which matches any event, or list of event codes.
 
|-
| ''Severity Filter''
| Event's severity. This field contains selection of event severities to be matched.
 
|-
| ''Filtering Script''
| Optional matching script written in NXSL. If this field is empty, no additional checks performed. Otherwise, the event will be considered as matched only if the script returns non-zero (TRUE) return code. For more information about NetXMS scripting language please refer to the chapter [[UM:NetXMS_Scripting_Language_(NXSL)|NetXMS Scripting Language (NXSL)]] in this manual.
 
|}
In action part you can set alarm generation, situation update, and list of actions to be executed. Every rule can also have a free-form textual comment.
 
Each event passes through all rules in the policy, so if it matches to more than one rule, actions specified in all matched rules will be executed. You can change this behavior by setting ''Stop Processing'' flag for the rule. If this flag is set and rule matched, processing of current event will be stopped.
 
You can create and modify Event Processing Policy using '''Event Processing Policy Editor'''. To access the '''Event Processing Policy Editor''' window, press F9 or on the '''View '''menu click '''Control Panel '''to access the '''Control Panel''' window and then click the '''Event Processing Policy''' icon.
 
=== Examples ===
 
[[File:EP_Rule_Config_example_1.png|frame||left|Rule Configuration Example 1]]<br style="clear: both" />
 
This rule defines that for every major or critical event originated from a node named "IPSO" two e-mail actions must be executed.
 
[[File:EP_Example2.png||none|alt=Example]]
 
 
This rule defines that for events NOKIA_CFG_CHANGED, NOKIA_CFG_SAVED, NOKIA_LOW_DISK_SPACE, and NOKIA_NO_DISK_SPACE, originated from any node, system should generate alarm with text "%m" (which means "use event's message text) and severity equal to event's severity.
 
== Alarms ==
=== Alarms Overview ===
As a result of event processing some events can be shown up as ''alarms''. Usually alarm represents something that needs attention of network administrators or network control center operators, for example low free disk space on a server. Every alarm has the following attributes:
 
{| class="wikitable" style="width: 70%"
! width="20%"|Attribute || Description
|-
| Creation time
| Time when alarm was created.
|-
| Last change time
| Time when alarm was last changed (for example, acknowledged).
|-
| State
| Alarm can be in one of the following states:
 
{| class="wikitable"
| [[File:Outstanding.png]]
| Outstanding
| New alarm;
 
|-
| [[File:Acknowledged.png]]
| Acknowledged
| When network administrator sees an alarm, he may ''acknowledge'' it to indicate that somebody already aware of that problem and working on it. A new event with the same alarm ID will reset the alarm state back to outstanding;
 
|-
| [[File:Acknowledged_sticky.png]]
| Sticky Acknowledged for time
| Alarm will remain acknowledged for given time interval even after new matching events, after time will pass alarm will be moved to outstanding state. This option can be used like snooze. When you know that there will be new matching events, but it will not change the situation. But after some time someone should check this problem. For example, if you have problem that cannot be solved until next week, so this alarm can be sticky acknowledged for 7 days. After 7 days this problem again will be in outstanding state.
 
|-
| [[File:Acknowledged_sticky.png]]
| Sticky Acknowledged
| Alarm will remain acknowledged even after new matching events. This can be useful when you know that there will be new matching events, but it will not change the situation. For example, if you have network device which will send new SNMP trap every minute until problem solved, sticky acknowledge will help to eliminate unnecessary outstanding alarms.
 
|-
| [[File:Resolved.png]]
| Resolved
| Network administrator sets this state when the problem is solved.
 
|-
| [[File:Terminated.png]]
| Terminated
| Inactive alarm. When problem is solved, network administrator can terminate alarm. This will remove alarm from active alarms list and it will not be seen in console, but alarm record will remain in database.
 
|}
 
|-
| Message
| Message text (usually derived from originating event's message text).
 
|-
| Severity
| Alarm's severity - Normal, Warning, Minor, Major, or Critical.
|-
| Source
| Source node (derived from originating event).
|-
| Key
| Text string used to identify duplicate alarms and for automatic alarm termination.
 
|}
 
There are 2 types of alarm state flows: strict and not strict. This option can be configured in Preference page of Alarms or on server configuration page, parameter "StrictAlarmStatusFlow". The difference between them is that in strict mode Terminate can be done only after Resolve state.
 
Not strict(by default):
 
[[File:AlarmStatesTransitionsInvokedByUser-NOTstrict.png|424px]]
 
 
Strict:
 
[[File:AlarmStatesTransitionsInvokedByUser-strict.png|424px]]
 
=== Generating Alarms ===
To generate alarms from events, you should edit "Alarm" field in appropriate rule of Event Processing Policy. Alarm configuration dialog will look like this:
 
'''Figure 10: Alarm configuration dialog'''
 
[[File:Alarm_config.png||none|alt=Alarm configuration dialog]]
 
 
You should select '''Generate new alarm''' radio button to enable alarm generation from current rule. In the '''Message''' field enter alarm's text, and in the alarm key enter value which will be used for repeated alarms detection and automatic alarm termination. In both fields you can use macros described in the [[UM:Event_Processing#Macros_for_Event_Processing|Macros for Event Processing]] section.
 
You can also configure sending of additional event if alarm will stay in '''Outstanding''' state for given period of time. To enable this, enter desired number of seconds in '''Seconds''' field, and select event to be sent. Entering value of 0 for seconds will disable additional event sending.
 
=== Automatic Alarm Termination ===
You can terminate all active alarms with given key as a reaction for the event. To do this, select '''Terminate alarm''' radio button in alarm configuration dialog and enter value for alarm key. For that field you can use macros described in the [[UM:Event_Processing#Macros_for_Event_Processing|Macros for Event Processing]] chapter.
 
== Situations ==
=== Situations Overview ===
Situations is a special type of event processing objects which allow you to track current state of your infrastructure and process events accordingly. Each situation has one or more instances, and each instance has one or more attributes. Situation objects allow you to store information about current situation in attributes and then use this information in event processing. For example, if you have one service (service A) depending on another (service B), and in case of service B failure you wish to get alarm about service B failure, and not about consequent service A failure. To accomplish this, you can do the following:
 
# Create situation object named "ServiceStatus";
# In event processing policy, for processing of event indicating service B failure, add situation attribute update: update situation "ServiceStatus", instance "Service_B", set attribute "status" to "failed";
# In event processing policy, for rule generating alarm in case of service A failure, add additional filtering using script - to match this rule only if service B is not failed. Your script may look like the following:
 
<syntaxhighlight lang="c">
sub main()
{
    s = FindSituation("ServiceStatus", "Service_B");
    if (s != NULL)
    {
        if (s->status == "failed")
            return 0; // Don't match rule
    }
    return 1; // Match rule
}
</syntaxhighlight>
 
 
=== Defining Situations ===
Situations can be configured via management console. To open situations editor, select '''View''' in main menu, then '''Situations'''. You will see situations tree. At the top of the tree is an abstract root element. Below are all defined situations – initially there are no situations, so you will see only root element. You can create situation either by right-clicking root element and selecting '''Create''' from pop-up menu, or by selecting '''Create''' under '''Situation '''in main menu.
 
Next level in the tree below situations is situation instances. Initially it is empty, but when situations start updating, you will see existing instances for each situation.
 
=== Updating Situations ===
Situations can be updated via Event Processing Policy. To update situation, you can edit '''Situation''' field in appropriate rule. Situation update dialog will looks like following:
 
 
'''Figure 11: Situation update dialog'''
 
[[File:Situation_Config.png||none|alt=Situation update dialog]]
 
You can select situation to update, and enter instance name and attributes to be set. In instance name and attributes' values you can use same macros as in alarm generation.
 
== Macros for Event Processing ==
On various stages of event processing you may need to use macros to include information like event source, severity, or parameter in your event texts, alarms, or actions. You may use the following macros to accomplish this:
 
{| class="wikitable" style="width: 70%"
! Macro || Description
|-
| %n
| Name of event source object.
|-
| %a
| IP address of event source object.
|-
| %g
| Globally unique identifier (GUID) of event source object.
|-
| %i
| Unique ID of event source object in hexadecimal form. Always prefixed with 0x and contains exactly 8 digits (for example 0x000029AC).
|-
| %I
| Unique ID of event source object in decimal form.
|-
| %t
| Event's timestamp is a form ''day-month-year hour:minute:second''.
|-
| %T
| Event's timestamp as a number of seconds since epoch (as returned by time() function).
|-
| %c
| Event's code.
|-
| %N
| Event's name.
|-
| %s
| Event's severity code as number. Possible values are:
{| class="wikitable"
| 0
| Normal
 
|-
| 1
| Warning
 
|-
| 2
| Minor
 
|-
| 3
| Major
 
|-
| 4
| Critical
 
|}
 
|-
| %S
| Event's severity code as text.
|-
| %v
| NetXMS server's version.
|-
| %u
| User tag associated with the event.
|-
| %m
| Event's message text (meaningless in event template).
|-
| %A
| Alarm's text (can be used only in actions to put text of alarm from the same event processing policy rule).
|-
| %M
| Custom message text. Can be set in filtering script by setting CUSTOM_MESSAGE variable.
|-
| <nowiki>%[</nowiki>''name'']
| Value returned by script. You should specify name of the script from script library.
|-
| %{''name''}
| Value of custom attribute.
|-
| <nowiki>%<</nowiki>''name''>
| Event's parameter with given name.
|-
| %1 - %99
| Event's parameter number 1 .. 99.
|-
| %%
| Insert % character.
|}
 
If you need to insert special characters (like carriage return) you can use the following notations:
 
{| class="wikitable"
! Char || Description
|-
| \t
| Tab character
|-
| \n
| CR/LF character pair
|-
| \\
| Backslash character
|}

Revision as of 16:05, 24 November 2017

Information moved to documentation:

https://www.netxms.org/documentation/adminguide