MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair) Calculation

From NetXMS Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
This Wiki is deprecated and we are are currrently migrating remaining pages into product documentation (Admin Guide, NXSL Guide)

Create a template called say "Availability" with the four DCIs shown in the following screenshot:

Error creating thumbnail: Unable to save thumbnail to destination

Transformation scripts for each DCI are following.

For "Failures" DCI:

return GetCustomAttribute($node, "NumFailures");

For "MTBF (hours)" DCI:

       
return GetCustomAttribute($node, "mtbf");

For "MTTR (hours)" DCI:

       
    return GetCustomAttribute($node, "mttr");

For "Node availability (percentage)" DCI:

       
// This script calculates MTTR, MTBF and perAvailability parameters and stores them in custom attributes
// Initialize some custom attributes the first time.
// Undefined attributes are created by SetCustomAttribute function automatically
CurrentStatus = GetDCIValue($node, FindDCIByName($node, "Status"));
PreviousState = GetCustomAttribute($node, "PreviousState");
if (PreviousState == null)
{ // In the first time, previous state is null
	SetCustomAttribute($node, "PreviousState", CurrentStatus);
	SetCustomAttribute($node, "TimeStamp", time());   
	SetCustomAttribute($node, "NumFailures", 0); 
	SetCustomAttribute($node, "TotalUptime", 0);
	SetCustomAttribute($node, "TotalDowntime", 0);
	return 100;
}

// From here the 2nd and subsequent times
NumFailures = GetCustomAttribute($node, "NumFailures");
LastTime = time() - GetCustomAttribute($node, "TimeStamp");

// Status is up
if (CurrentStatus == 0)
{
	if (PreviousState != CurrentStatus)
	{   // just changed to up
		// update mttr
		TotalDowntime = GetCustomAttribute($node, "TotalDowntime") + LastTime;
		mttr = TotalDowntime / ((NumFailures == 0) ? 1 : NumFailures) / 3600;   // to prevent division by ze
		SetCustomAttribute($node, "TotalDowntime", TotalDowntime);
		SetCustomAttribute($node, "mttr", mttr);
	}
	else
	{      // still up
		// update mtbf
		TotalUptime = GetCustomAttribute($node, "TotalUptime") + LastTime;
		mtbf = TotalUptime / ((NumFailures == 0) ? 1 : NumFailures) / 3600;   // to prevent division by zero
		SetCustomAttribute($node, "TotalUptime", TotalUptime);
		SetCustomAttribute($node, "mtbf", mtbf);
	}
}

// Status is down
if (CurrentStatus == 4)
{
	if (PreviousState != CurrentStatus)
	{   // just changed to down
		// update mtbf
		NumFailures++;
		TotalUptime = GetCustomAttribute($node, "TotalUptime") + LastTime;
		mtbf = TotalUptime / NumFailures / 3600;
		SetCustomAttribute($node, "NumFailures", NumFailures);
		SetCustomAttribute($node, "TotalUptime", TotalUptime);
		SetCustomAttribute($node, "mtbf", mtbf);
	}
	else
	{   // still down
		// update mttr
		TotalDowntime = GetCustomAttribute($node, "TotalDowntime") + LastTime;
		mttr = TotalDowntime / NumFailures / 3600;
		SetCustomAttribute($node, "TotalDowntime", TotalDowntime);
		SetCustomAttribute($node, "mttr", mttr);
	}
}

if (CurrentStatus == 0 || CurrentStatus == 4)
{
	// Save previous state and timestamp
	SetCustomAttribute($node, "PreviousState", CurrentStatus);
	SetCustomAttribute($node, "TimeStamp", time());   

	// perAvailability section
	TotalUptime = GetCustomAttribute($node, "TotalUptime");
	TotalDowntime = GetCustomAttribute($node, "TotalDowntime");
	perAvailability = TotalUptime / (TotalUptime + TotalDowntime) * 100;
	SetCustomAttribute($node, "perAvailability", perAvailability);

	return perAvailability;
}


Apply this template manually or automatically (using auto-apply script) to required nodes.

This way we avoid having to define custom attributes (now created by the fourth transformation script), events, actions, event processing policy rules, etc. In addition, the four DCIs are updated at each polling interval. Provided solution only supports up (Normal = 0) and down (Critical = 4) node status.

This solution was created based on the following discussion on NetXMS support forum: MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair).