MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair) Calculation

From NetXMS Wiki
Jump to: navigation, search

Create a template called say "Availability" with the four DCIs shown in the following screenshot:

Availability template DCIs.png

Transformation scripts for each DCI are following.

For "Failures" DCI:

return GetCustomAttribute($node, "NumFailures");

For "MTBF (hours)" DCI:

 
return GetCustomAttribute($node, "mtbf");

For "MTTR (hours)" DCI:

 
    return GetCustomAttribute($node, "mttr");

For "Node availability (percentage)" DCI:

 
// This script calculates MTTR, MTBF and perAvailability parameters and stores them in custom attributes
// Initialize some custom attributes the first time.
// Undefined attributes are created by SetCustomAttribute function automatically
CurrentStatus = GetDCIValue($node, FindDCIByName($node, "Status"));
PreviousState = GetCustomAttribute($node, "PreviousState");
if (PreviousState == null)
{ // In the first time, previous state is null
	SetCustomAttribute($node, "PreviousState", CurrentStatus);
	SetCustomAttribute($node, "TimeStamp", time());   
	SetCustomAttribute($node, "NumFailures", 0); 
	SetCustomAttribute($node, "TotalUptime", 0);
	SetCustomAttribute($node, "TotalDowntime", 0);
	return 100;
}
 
// From here the 2nd and subsequent times
NumFailures = GetCustomAttribute($node, "NumFailures");
LastTime = time() - GetCustomAttribute($node, "TimeStamp");
 
// Status is up
if (CurrentStatus == 0)
{
	if (PreviousState != CurrentStatus)
	{   // just changed to up
		// update mttr
		TotalDowntime = GetCustomAttribute($node, "TotalDowntime") + LastTime;
		mttr = TotalDowntime / ((NumFailures == 0) ? 1 : NumFailures) / 3600;   // to prevent division by ze
		SetCustomAttribute($node, "TotalDowntime", TotalDowntime);
		SetCustomAttribute($node, "mttr", mttr);
	}
	else
	{      // still up
		// update mtbf
		TotalUptime = GetCustomAttribute($node, "TotalUptime") + LastTime;
		mtbf = TotalUptime / ((NumFailures == 0) ? 1 : NumFailures) / 3600;   // to prevent division by zero
		SetCustomAttribute($node, "TotalUptime", TotalUptime);
		SetCustomAttribute($node, "mtbf", mtbf);
	}
}
 
// Status is down
if (CurrentStatus == 4)
{
	if (PreviousState != CurrentStatus)
	{   // just changed to down
		// update mtbf
		NumFailures++;
		TotalUptime = GetCustomAttribute($node, "TotalUptime") + LastTime;
		mtbf = TotalUptime / NumFailures / 3600;
		SetCustomAttribute($node, "NumFailures", NumFailures);
		SetCustomAttribute($node, "TotalUptime", TotalUptime);
		SetCustomAttribute($node, "mtbf", mtbf);
	}
	else
	{   // still down
		// update mttr
		TotalDowntime = GetCustomAttribute($node, "TotalDowntime") + LastTime;
		mttr = TotalDowntime / NumFailures / 3600;
		SetCustomAttribute($node, "TotalDowntime", TotalDowntime);
		SetCustomAttribute($node, "mttr", mttr);
	}
}
 
if (CurrentStatus == 0 || CurrentStatus == 4)
{
	// Save previous state and timestamp
	SetCustomAttribute($node, "PreviousState", CurrentStatus);
	SetCustomAttribute($node, "TimeStamp", time());   
 
	// perAvailability section
	TotalUptime = GetCustomAttribute($node, "TotalUptime");
	TotalDowntime = GetCustomAttribute($node, "TotalDowntime");
	perAvailability = TotalUptime / (TotalUptime + TotalDowntime) * 100;
	SetCustomAttribute($node, "perAvailability", perAvailability);
 
	return perAvailability;
}


Apply this template manually or automatically (using auto-apply script) to required nodes.

This way we avoid having to define custom attributes (now created by the fourth transformation script), events, actions, event processing policy rules, etc. In addition, the four DCIs are updated at each polling interval. Provided solution only supports up (Normal = 0) and down (Critical = 4) node status.

This solution was created based on the following discussion on NetXMS support forum: MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair).