Difference between revisions of "UM:Data Collection"

From NetXMS Wiki
Jump to navigation Jump to search
Line 84: Line 84:
|-
|-
| L
| L
| last day (in a week or in a month)
| last day


|}
|}
Line 118: Line 118:


Run at 23:59 on the last day of every month.
Run at 23:59 on the last day of every month.
59 23 * * 5L
Run a 23:59 on the last friday of every month.


=== Associate with cluster resource ===
=== Associate with cluster resource ===

Revision as of 21:49, 13 June 2014

How data collection works

Every node can have many data collection items configured (see Basic Concepts for detailed description of DCI). NetXMS server has a set of threads dedicated to data collection, called Data Collectors, used to gather information from the nodes according to DCI configuration. You can control how many data collectors will run simultaneously, by changing server configuration parameter NumberOfDataCollectors.

All configured DCIs are checked for polling requirement every two seconds and if DCI needs to be polled, appropriate polling request is placed into internal data polling queue. First available data collector will pick up the request and gather information from the node according to DCI configuration. If a new value was received successfully, it's being stored in the database, and thresholds are checked. After threshold checking, data collector is ready for processing new request. Processing of a newly received parameter value is outlined on the figure below.

Error creating thumbnail: Unable to save thumbnail to destination
Newly received parameter processing


It is also possibility to push data to server. If DCI is configured as push, server just waits for new values instead of polling data source.

DCI configuration

Basic configuration

Data collection for a node can be configured using management console. To open data collection configuration window, right-click node object in object browser or on a map, and click Data Collection. You will see the list of configured data collection items. From here, you can add new or change existing parameters to monitor. Usual way to do something with DCIs is to right-click on appropriate record in the list and select a required action from popup menu.

When you create new DCI or open an existing one, you will see a lot of attributes. The list of definitions and descriptions for the attributes is given below.


Description

Description is a free-form text string describing DCI. It is not used by the server and is intended for better information understanding by operators. If you use the Select button to select a parameter from the list, description field will be filled automatically.


Parameter

Name of the parameter of interest, used for making a request to target node. For NetXMS agent and internal parameters it will be parameter name, and for SNMP agent it will be an SNMP OID. You can use the Select button for easier selection of required parameter name.


Origin

Origin of data (or method of obtaining data). Possible origins are NetXMS agent, SNMP agent, CheckPoint SNMP agent, Internal (data generated inside NetXMS server process), or Push Agent. Last origin is very different from all others, because it represents DCIs whose values are pushed to server by external program (usually via nxpush or nxapush command line tool) instead of being polled by the server based on the schedule.

Data Type

Data type for the parameter. Can be one of the following: Integer, Unsigned Integer, 64-bit Integer, 64-bit Unsigned Integer, Float (floating point number), or String. Selected data type affects collected data processing — for example, you cannot use operations like “less than” or “greater than” on strings. If you select parameter from the list using the Select button, correct data type will be set automatically.


Polling Interval

An interval between consecutive polls, in seconds. If you select the Use advanced scheduling option, this field has no meaning and will be disabled.


Use Advanced Schedule

If you turn on this flag, NetXMS server will use custom schedule for collecting DCI values instead of fixed intervals. This schedule can be configured on the Schedule page. Advanced schedule consists of one or more records; each representing desired data collection time in cron-style format. Record has five fields, separated by spaces: minute, hour, day of month, month, and day of week.

Optionally, the sixth field can be specified for resolution in seconds (this is a non-standard extension which is not compatible with a regular cron format). Moreover, the sixth field (but not others) supports additional stepping syntax with a '%' percent sign, which means that the step in seconds calculated in absolute seconds since the Unix epoch (00:00:00 UTC, 1st of January, 1970). It's not recommended to use seconds in custom schedules as your main data collection strategy though. Use seconds only if it is absolutely necessary.

Allowed values for each filed are:

Table 2: Advanced Schedule field values

Field Value
minute 0 - 59
hour 0 - 23
day of month 1 - 32
month 1 - 12
day of week 0 - 7 (0 and 7 is Sunday)
seconds (optional) 0 - 59

There are also some special values:

Table 2.1: Advanced Schedule special values

Value Meaning
* any value
L last day


Some examples:

5 0 * * *

Run five minutes after midnight, every day.

15 14 1 * *

Run at 14:15 on the first day of every month.

*/5 * * *

Run every 5 minutes.

* * * * * 10

Run every minute on 10th second.

* * * * * */45

Run twice a minute (on seconds 0 and 45).

* * * * * *%45

Run every 45 seconds.

59 23 L * *

Run at 23:59 on the last day of every month.

59 23 * * 5L

Run a 23:59 on the last friday of every month.

Associate with cluster resource

In this field you can specify cluster resource associated with DCI. Data collection and processing will occur only if node you configured DCI for is current owner of this resource. This field is valid only for cluster member nodes.


Retention Time

This attribute specifies how long the collected data should be kept in database, in days. Minimum retention time is 1 day and maximum is not limited. However, keeping too many collected values for too long will lead to significant increase of your database size and possible performance degradation.


Status

DCI status can be one of the following: Active, Disabled, Not Supported. Server will collect data only if the status is Active. If you wish to stop data collection without removing DCI configuration and collected data, the Disabled status can be set manually. If requested parameter is not supported by target node, the Not Supported status is set by the server.


Data Transformations

In simplest case, NetXMS server collects values of specified parameters and stores them in the database. However, you can also specify various transformations for original value. For example, you may be interested in a delta value, not in a raw value of some parameter. Or, you may want to have parameter value converted from bytes to kilobytes. All transformations will take place after receiving new value and before threshold processing.

Data transformation consists of two steps. On the first step, delta calculation is performed. You can choose four types of delta calculation:

Table 3: Delta calculation types


Calculation Type Description
None No delta calculation performed. This is the default setting for newly created DCI.
Simple Resulting value will be calculated as a difference between current raw value and previous raw value. By raw value is meant the parameter value originally received from host.
Average per second Resulting value will be calculated as a difference between current raw value and previous raw value, divided by number of seconds passed between current and previous polls.
Average per minute Resulting value will be calculated as a difference between current raw value and previous raw value, divided by number of minutes passed between current and previous polls.

On the second step, custom transformation script is executed (if presented). By default, newly created DCI does not have a transformation script. If transformation script is presented, the resulting value of the first step is passed to the transformation script as a parameter; and a result of script execution is a final DCI value. Transformation script gets original value as first argument (available via special variable $1), and also has two predefined global variables: $node (reference to current node object), and $dci (reference to current DCI object). For more information about NetXMS scripting language, please consult NetXMS Scripting Language chapter in this manual.

Simple example:

$1 * 8 / 1000

Table data transformations example:

idxVoltage = $1->getColumnIndex("VOLTAGE");
idxStatus = $1->getColumnIndex("STATUS");

for(i = 0; i < $1->rowCount; i++)
{
   $1->set(i, idxVoltage, $1->get(i, idxVoltage) / 1000);
   if ($1->get(i, idxStatus) == 3)
   {
      $1->set(i, idxStatus, "OK");
   }
}

Thresholds

Overview

For every DCI you can define one or more thresholds. Each threshold there is a pair of condition and event – if condition becomes true, associated event is generated. To configure thresholds, open the data collection editor for node or template, right-click on the DCI record and select Edit from the popup menu, then select the Thresholds page. You can add, modify and delete thresholds using buttons below the threshold list. If you need to change the threshold order, select one threshold and use arrow buttons located on the right to move the selected threshold up or down.


Instance

Each DCI has an Instance attribute, which is a free-form text string, passed as a 6th parameter to events associated with thresholds. You can use this parameter to distinguish between similar events related to different instances of the same entity. For example, if you have an event generated when file system was low on free space, you can set the Instance attribute to file system mount point.


Threshold Processing

Threshold processing algorithm is outlined on Figure 5.

Figure 5: Threshold processing algorithm

Error creating thumbnail: Unable to save thumbnail to destination


As you can see from this flowchart, threshold order is very important. Let's consider the following example: you have DCI representing CPU utilization on the node, and you wish two different events to be generated - one when CPU utilization exceeds 50%, and another one when it exceeds 90%. What happens when you place threshold "> 50" first, and "> 90" second? The following table shows values received from host and actions taken by monitoring system (assuming that all thresholds initially unarmed):

Value Action
10 Nothing will happen.
55 When checking first threshold ("> 50"), the system will find that it's not active, but condition evaluates to true. So, the system will set threshold state to "active" and generate event associated with it.
70 When checking first threshold ("> 50"), the system will find that it's already active, and condition evaluates to true. So, the system will stop threshold checking and will not take any actions.
95 When checking first threshold ("> 50"), the system will find that it's already active, and condition evaluates to true. So, the system will stop threshold checking and will not take any actions.

Please note that second threshold actually is not working, because it's "masked" by the first threshold. To achieve desired results, you should place threshold "> 90" first, and threshold "> 50" second.

You can disable threshold ordering by checking Always process all thresholds checkbox. If it is marked, system will always process all thresholds.


Threshold Configuration

When adding or modifying a threshold, you will see the following dialog:

Error creating thumbnail: Unable to save thumbnail to destination

First, you have to select what value will be checked:

last polled value Last value will be used. If number of polls set to more then 1, then condition will evaluate to true only if it's true for each individual value of last n polls.
average value An average value for last n polls will be used (you have to configure a desired number of polls).
mean deviation A mean absolute deviation for last n polls will be used (you have to configure a desired number of polls). Additional information on how mean absolute deviation calculated can be found here: http://en.wikipedia.org/wiki/Mean_deviation.
diff with previous value A delta between last and previous values will be used. If DCI data type is string, system will use 0, if last and previous values match; and 1, if they don't.
data collection error An indicator of data collection error. Instead of DCI's value, system will use 0 if data collection was successful, and 1 if there was a data collection error. You can use this type of thresholds to catch situations when DCI's value cannot be retrieved from agent.

Second, you have to select comparison function. Please note that not all functions can be used for all data types. Below is a compatibility table:

Integer Unsigned Integer Int64 Unsigned Int64 Float String
less
X
X
X
X
X
less or equal
X
X
X
X
X
equal
X
X
X
X
X
X
greater or equal
X
X
X
X
X
greater
X
X
X
X
X
not equal
X
X
X
X
X
X
like
X
not like
X

Third, you have to set a value to check against. If you use like or not like functions, value is a pattern string where you can use metacharacters: "*" (asterisk), which means "any number of any characters", and "?" (question mark), which means "any character".

Fourth, you have to select events to be generated when the condition becomes true or returns to false. By default, system uses SYS_THRESHOLD_REACHED and SYS_THRESHOLD_REARMED events, but in most cases you will change it to your custom events.

You can also configure threshold to resend activation event if threshold's condition remain true for specific period of time. You have three options - default, which will use server-wide settings, never, which will disable resending of events, or specify interval in seconds between repeated events.

Thresholds and Events

You can choose any event to be generated when threshold becomes active or returns to inactive state. However, you should avoid using predefined system events (their names usually start with SYS_ or SNMP_). For example, you set event SYS_NODE_CRITICAL to be generated when CPU utilization exceeds 80%. System will generate this event, but it will also generate the same event when node status will change to CRITICAL. In your event processing configuration, you will be unable to determine actual reason for that event generation, and probably will get some unexpected results. If you need custom processing for specific threshold, you should create your own event first, and use this event in the threshold configuration. NetXMS has some preconfigured events that are intended to be used with thresholds. Their names start with DC_.

The system will pass the following six parameters to all events generated as a reaction to threshold violation:

  1. Parameter name (DCI's name attribute)
  2. DCI description
  3. Threshold value
  4. Actual value
  5. Unique DCI identifier
  6. Instance (DCI's instance attribute)

For example, if you are creating a custom event that is intended to be generated when file system is low on free space, and wish to include file system name, actual free space, and threshold's value into event's message text, you can use message template like this:

File system %6 has only %4 bytes of free space (threshold: %3 bytes)

For events generated on threshold's return to inactive state (default event is SYS_THRESHOLD_REARMED), parameter list is different:

  1. Parameter name (DCI's name attribute)
  2. DCI description
  3. Unique DCI identifier
  4. Instance (DCI's instance attribute)

Push parameters

NetXMS gives you ability to push DCI values when you need it instead of polling them on specific time intervals. To be able to push data to the server, you should take the following steps:

  1. Set your DCI's origin to Push Agent and configure other properties as usual, excluding polling interval which is meaningless in case of pushed data.
  2. Create separate user account or pick an existing one and give "Push Data" right on the DCI owning node to that user.
  3. Use nxpush utility or client API for pushing data.

List DCIs

Usually DCIs have scalar values. A list DCI is a special DCI which returns a list of values. List DCIs are mostly used by NetXMS internally (to get the list of network interfaces during the configuration poll, for example) but can also be utilized by user in some occasions. NetXMS Management Console does not support list DCIs directly but their names are used as input parameters for Instance Discovery methods. List DCI values can be also obtained with nxget command line utility (e.g. for use in scripts).

Templates

What is template

Often you have a situation when you need to collect same parameters from different nodes. Such configuration making may easily fall into repeating one action many times. Things may became even worse when you need to change something in already configured DCIs on all nodes - for example, increase threshold for CPU utilization. To avoid these problems, you can use data collection templates. Data collection template (or just template for short) is a special object, which can have configured DCIs similar to nodes.

When you create template and configure DCIs for it, nothing happens - no data collection will occur. Then, you can apply this template to one or multiple nodes - and as soon as you do this, all DCIs configured in the template object will appear in the target node objects, and server will start data collection for these DCIs. If you then change something in the template data collection settings - add new DCI, change DCI's configuration, or remove DCI - all changes will be reflected immediately in all nodes associated with the template. You can also choose to remove template from a node. In this case, you will have two options to deal with DCIs configured on the node through the template - remove all such DCIs or leave them, but remove relation to the template. If you delete template object itself, all DCIs created on nodes from this template will be deleted as well.

Please note that you can apply an unlimited number of templates to a node - so you can create individual templates for each group of parameters (for example, generic performance parameters, MySQL parameters, network counters, etc.) and combine them, as you need.

Creating template

To create a template, right-click on Template Root or Template Group object in the Object Browser, and click Create and then select Template. Enter a name for a new template and click OK.


Configuring templates

To configure DCIs in the template, right-click on Template object in the Object Browser, and select Data Collection from the popup menu. Data collection editor window will open. Now you can configure DCIs in the same way as the node objects.


Applying template to node

To apply a template to one or more nodes, right-click on template object in Object Browser and select Apply from popup menu. Node selection dialog will open. Select the nodes that you wish to apply template to, and click OK (you can select multiple nodes in the list by holding CTRL key). Please note that if data collection editor is open for any of the target nodes, either by you or another administrator, template applying will be delayed until data collection editor for that node will be closed.

Automatic Apply Rules

sub main() {
	if ($node->snmpOID == ".1.3.6.1.4.1.311.1.1.3.1.2" || $node->snmpOID == ".1.3.6.1.4.1.311.1.1.3.1.1") {
	return 1;
	} else { 
	return 0;
	}
}

Removing template from node

To remove a link between template and node, right-click on Template object in the Object Browser and select Unbind from popup menu. Node selection dialog will open. Select one or more nodes you wish to unbind from template, and click OK. The system will ask you how to deal with DCIs configured on node and associated with template:

Figure 7: Remove Template window

Error creating thumbnail: Unable to save thumbnail to destination


If you select Unbind DCIs from template, all DCIs related to template will remain configured on a node, but association between the DCIs and template will be removed. Any further changes to the template will not be reflected in these DCIs. If you later reapply the template to the node, you will have two copies of each DCI - one standalone (remaining from unbind operation) and one related to template (from new apply operation). Selecting Remove DCIs from node will remove all DCIs associated with the template. After you click OK, node will be unbound from template.

Macros in template items

You can use various macros in name, description, and instance fields of template DCI. These macros will be expanded when template applies to node. Macro started with %{ character combination and ends with } character. The following macros are currently available:


Macro Expands to...
node_id Node unique id
node_name Node name
node_primary_ip Node primary IP address
script:name String returned by script name. Script should be stored in script library (accessible via Configuration -> Script Library). Inside the script, you can access current node's properties via $node variable.

For example, if you wish to insert node's IP address into DCI description, you can enter the following in the description field of template DCI:

My ip address is %{node_primary_ip}

When applying to node with primary IP address 10.0.0.1, on the node will be created DCI with the following description:

My ip address is 10.0.0.1

Please note that if you change something in the node, name for example, these changes will not be reflected automatically in DCI texts generated from macros. However, they will be updated if you reapply template to the node.

Instance Discovery

Sometimes you may need to monitor multiple instances of some entity, with exact names and number of instances not known or different from node to node. Typical example is file systems or network interfaces. To automate creation of DCIs for each instance you can use instance discovery mechanism. First you have to create "master" DCI. Create DCI as usual, but in places where normally you would put instance name, use the special macro {instance}. Then, go to "Instance Discovery" tab in DCI properties, and configure instance discovery method and optionally filter script.

Discovery Methods

There are different methods for instance discovery:

Method Input Data Description
Agent List List name Read list from agent and use it's values as instance names.
Agent Table Table name Read table from agent and use it's instance column values as instance names.
SNMP Walk - Values Base OID Do SNMP walk starting from given OID and use values of returned varbinds as instance names.
SNMP Walk - OIDs Base OID Do SNMP walk starting from given OID and use IDs of returned varbinds as instance names.

Instance Filter

You can optionally filter out unneeded instances and transform instance names using filtering script written in NXSL. Script will be called for each instance and can return TRUE (to accept instance), FALSE (to reject instance), and array of two elements - first is TRUE and second is new value for instance name.

Working with collected data

Once you setup DCI, data starts collecting in the database. You can access this data and work with it in different ways.

View collected data in graphical form

You can view collected data in a graphical form, as a line chart. To view values of some DCI as a chart, first open either Data Collection Editor or Last Values view for a host. You can do it from the Object Browser or map by selection host, right-clicking on it, and selecting Data collection or Last DCI values. Then, select one or more DCIs (you can put up to 16 DCIs on one graph), right-click on them and choose Graph from the pop-up menu. You will see graphical representation of DCI values for the last hour.

When the graph is open, you can do various tasks:

Select different time interval

By default, you will see data for the last hour. You can select different time interval in two ways:

  1. Select new time interval from presets, by right-clicking on the graph, and then selecting Presets and appropriate time interval from the pop-up menu.
  2. Set time interval in graph properties dialog. To access graph properties, right-click on the graph, and then select Properties from the pop-up menu. Alternatively, you can select Properties on the Graph menu (main application menu). In the properties dialog, you will have two options: select exact time interval (like 12/10/2005 from 10:00 to 14:00) or select time interval based on current time (like last two hours).

Turn on automatic refresh

You can turn on automatic graph refresh at a given interval in graph properties dialog. To access graph properties, right-click on it, and select Properties from the pop-up menu. Alternatively, you can select Properties on the Graph menu (main application menu). In the properties dialog, select the Refresh automatically checkbox and enter a desired refresh interval in seconds in edit box below. When automatic refresh is on, you will see "Autoupdate" message in the status bar of graph window.

Change colors

You can change colors used to paint lines and graph elements in the graph properties dialog. To access graph properties, right-click on it, and select Properties from the pop-up menu. Alternatively, you can select Properties on the Graph menu (main application menu). In the properties dialog, click on colored box for appropriate element to choose different color.

Save current settings as predefined graph

You can save current graph settings as predefined graph to allow quick and easy access in the future to information presented on graph. Preconfigured graphs can be used either by you or by other NetXMS users, depending on settings. To save current graph configuration as predefined graph, select Save as predefined from graph view menu. The following dialog will appear:


Error creating thumbnail: Unable to save thumbnail to destination


In Graph name field, enter desired name for your predefined graph. It will appear in predefined graph tree exactly as written here. You can use -> character pair to create subtree. For example, if you name your graph NetXMS Server->System->CPU utilization (iowait) it will appear in the tree as following:

Error creating thumbnail: Unable to save thumbnail to destination


You can edit predefined graph by right-clicking on it in predefined graph tree, and selecting Properties from context menu. On Predefined Graph property page you can add users and groups who will have access to this graph. Note that user creating the graph will always have full access to it, even if he is not in access list.

If you need to delete predefined graph, you can do it by right-clicking on it in predefined graph tree, and selecting Delete from context menu.

View collected data in textual form

You can view collected data in a textual form, as a table with two columns – timestamp and value. To view values of some DCI as a table, first open either Data Collection Editor or Last Values view for a host. You can do it from the Object Browser or map by selection host, right-clicking on it, and selecting Data collection or Last DCI values. Then, select one or more DCIs (each DCI data will be shown in separate view), right-click on them and choose Show history from the pop-up menu. You will see the last 1000 values of the DCI.

Export DCI data

You can export collected data to a text file. To export the DCI data, first open either Data Collection Editor or Last Values view for the host. You can do it from the Object Browser or map by selection the host, right-clicking on it, and selecting Data collection or Last DCI values. Then, select one DCI, right-click on it and select Export data from the pop-up menu. Export configuration dialog will appear (Figure 8).

Figure 8: Export configuration dialog

Error creating thumbnail: Unable to save thumbnail to destination


Enter the name of the file you wish to save the data to. You can use the ... button next to the File name box, to open a file system browser. Then select separator to be used between timestamps and values, time stamp format and time frame. Time stamp format can be either Raw - number representing time in a UNIX timestamp format, or Text - string in the format "dd/mm/yyyy HH:MM:SS".

Finally, click OK and wait for the export process to complete.