MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair) Calculation
This Wiki is deprecated and we are are currrently migrating remaining pages into product documentation (Admin Guide, NXSL Guide) |
Create a template called say "Availability" with the four DCIs shown in the following screenshot:
Transformation scripts for each DCI are following.
For "Failures" DCI:
return GetCustomAttribute($node, "NumFailures");
For "MTBF (hours)" DCI:
return GetCustomAttribute($node, "mtbf");
For "MTTR (hours)" DCI:
return GetCustomAttribute($node, "mttr");
For "Node availability (percentage)" DCI:
// This script calculates MTTR, MTBF and perAvailability parameters and stores them in custom attributes
// Initialize some custom attributes the first time.
// Undefined attributes are created by SetCustomAttribute function automatically
CurrentStatus = GetDCIValue($node, FindDCIByName($node, "Status"));
PreviousState = GetCustomAttribute($node, "PreviousState");
if (PreviousState == null)
{ // In the first time, previous state is null
SetCustomAttribute($node, "PreviousState", CurrentStatus);
SetCustomAttribute($node, "TimeStamp", time());
SetCustomAttribute($node, "NumFailures", 0);
SetCustomAttribute($node, "TotalUptime", 0);
SetCustomAttribute($node, "TotalDowntime", 0);
return 100;
}
// From here the 2nd and subsequent times
NumFailures = GetCustomAttribute($node, "NumFailures");
LastTime = time() - GetCustomAttribute($node, "TimeStamp");
// Status is up
if (CurrentStatus == 0)
{
if (PreviousState != CurrentStatus)
{ // just changed to up
// update mttr
TotalDowntime = GetCustomAttribute($node, "TotalDowntime") + LastTime;
mttr = TotalDowntime / ((NumFailures == 0) ? 1 : NumFailures) / 3600; // to prevent division by ze
SetCustomAttribute($node, "TotalDowntime", TotalDowntime);
SetCustomAttribute($node, "mttr", mttr);
}
else
{ // still up
// update mtbf
TotalUptime = GetCustomAttribute($node, "TotalUptime") + LastTime;
mtbf = TotalUptime / ((NumFailures == 0) ? 1 : NumFailures) / 3600; // to prevent division by zero
SetCustomAttribute($node, "TotalUptime", TotalUptime);
SetCustomAttribute($node, "mtbf", mtbf);
}
}
// Status is down
if (CurrentStatus == 4)
{
if (PreviousState != CurrentStatus)
{ // just changed to down
// update mtbf
NumFailures++;
TotalUptime = GetCustomAttribute($node, "TotalUptime") + LastTime;
mtbf = TotalUptime / NumFailures / 3600;
SetCustomAttribute($node, "NumFailures", NumFailures);
SetCustomAttribute($node, "TotalUptime", TotalUptime);
SetCustomAttribute($node, "mtbf", mtbf);
}
else
{ // still down
// update mttr
TotalDowntime = GetCustomAttribute($node, "TotalDowntime") + LastTime;
mttr = TotalDowntime / NumFailures / 3600;
SetCustomAttribute($node, "TotalDowntime", TotalDowntime);
SetCustomAttribute($node, "mttr", mttr);
}
}
if (CurrentStatus == 0 || CurrentStatus == 4)
{
// Save previous state and timestamp
SetCustomAttribute($node, "PreviousState", CurrentStatus);
SetCustomAttribute($node, "TimeStamp", time());
// perAvailability section
TotalUptime = GetCustomAttribute($node, "TotalUptime");
TotalDowntime = GetCustomAttribute($node, "TotalDowntime");
perAvailability = TotalUptime / (TotalUptime + TotalDowntime) * 100;
SetCustomAttribute($node, "perAvailability", perAvailability);
return perAvailability;
}
Apply this template manually or automatically (using auto-apply script) to required nodes.
This way we avoid having to define custom attributes (now created by the fourth transformation script), events, actions, event processing policy rules, etc. In addition, the four DCIs are updated at each polling interval. Provided solution only supports up (Normal = 0) and down (Critical = 4) node status.
This solution was created based on the following discussion on NetXMS support forum: MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair).