Let say you have some time to failure data on your equipment. A common action is to calculate the MTBF. All well and good until you expect to make a meaningful decision based on the calculation.
Using just the mean of the data, the MTBF value is likely to provide you with a less than useful bit of information. Thus your decision will be rather random or worthless.
Let’s explore just how this simple calculation of perfectly good data can mislead your decision making.
The Case of Perfect Data
In your factory or within your system you have data on five failures. In this case, the perfect case, all five failures occurred after 1,752 hours of operation. The system was installed and started one year ago, and today at noon, the fifth failure occurred.
What is the MTBF if the equipment has operated for a total 8,760 hours?
Easy right, it is 8,760 / 5 or 1,752 hours MTBF.
This even makes sense since each failure occurs like clockwork at 1,752 hours since start-up or the last repair. The five failures occurred equally spaced over the year.
If the failure is always the same part, we could just replace that part at a convenient time prior to 1,752 hours of operation and avoid the unwanted downtime. This is a ‘perfect’ case and if it occurs with your system, let me know, as I’ve never heard of or seen this type of pattern of failures outside of textbooks or contrived examples.
If you routinely just calculate MTBF who close to this perfect case is your data? Are you assuming this failure pattern when making decisions concerning spares, scheduling, etc?
The Case of Early Failures
Let’s say another system generates some failure data where four of the failure occurs in the first 100 hours of operation and the last failure occurs at the end of the year. A total of 5 failures again and there has been a total of 8,760 hours of operation.
The MTBF calculation is simple. 8,760 / 5 for 1,752 hour MTBF, right?
Would planning to replace the part as we operate for another 1,752 hours make sense? You could make the same decisions in the ‘perfect case’ and would likely not see downtime due to the part.
More likely after four failures inside of a week of operation, the team would investigate and solve the issue leading to the ‘early’ failures. Yet, if the end of the year calculation is simply the MTBF value, what have we learned? How could this data distort our understanding of the system’s operation and our best course of action going forward?
The Case of Wear Out Failures
“Bearings and MTBF” is one of my favorite examples. As you know, if a bearing is designed, applied, and installed correctly it will work well. It will function well till it wears out. As the lubrication breaks down it permits metal wear which leads to eventual bearing failure.
In the spirit of the trio of examples, let’s say we have installed five bearings on our system and all five failed after exactly 8,760 hours of use. The MTBF value is once again 8,760 / 5 or 1,762 hours, right?
We know bearing wear out, we know the failure was due to wear, we calculate a value which ignores the wear out time to failure pattern anyway. Why?
If we wanted to design an appropriate maintenance plan for bearings, we would not use MTBF as it would have us replacing them every 1,762 hours (unless those with practical thinking skills and experience intervened).
Making Better Use of Your Data
These three cases are contrived. This serves to illustrate the point that calculating MTBF is rather a waste of time at best. It is going to mislead you and your team by ignoring the time to failure patterns of decreasing or increasing over time. It is likely you will make poor decisions about your system, its reliability, and its maintenance.
Poor decisions cost money.
Treat your data better and make better decisions. Save money.
Instead of calculation MTBF, plot the data using a mean cumulative function. If the data is for a non-repairable element of your system, use the Weibull distortion or appropriate life data distribution to describe the failure pattern over time.
This short article only reveals a small set of a decision making routinely done using MTBF – how have you seen MTBF use lead to poor (and costly) decisions?