Ease and Joy of MTBF Calculation
Mean time between failure, MTBF is a common reliability metric. It is easy to calculate. It is regularly requested and readily offered.
MTBF also does not contain very much information of any use.
Ease of use
MTBF is the tally of time a set of items operate divided by the number of failures incurred (or by one if no failures occur). No need to track down time to failure, deal with censoring, or anything more complicated than a bit of addition and one division operation. The result is the mean of the time to failures.
The estimate for repairable items assumes the repair time is small, yet if you reset the clock for that item to only operating times, than it’s close enough. For non-repairable items, we may call the result mean time to failure, MTTF, yet the calculation is the same.
Even I can remember this formula.
Joy of use
MTBF or MTTF is a single value. It is a better, meaning longer life (in general) when the metric is larger. It avoids the adverse connotation of failure rates (higher is worse), as the metric is the inverse of the failure rate. The very name implies the average time between or before a failure. The units are in time, hours is common, or cycles or which ever unit makes sense.
MTBF does not include a duration, so we can test many units for a short time to provide an estimate. This may make life testing relatively quick. Or, when limited by samples, testing a few units for an extended time may provide the same results.
Failures do not impact the results, if any failures occur, as they are not that important as long as the total operating hours is sufficient given the number of failures. If plagued by early failures, just get the units repaired and run longer. If there is prevalent wear out mechanism, run more samples for a shorter time to avoid the wear out failures. With a little work, you can show nearly any MTBF desired.
Lack of information
MTBF by itself does not contain the period of time over which the inverse failure rate applies. It does not contain information about when the failure occur (early or late within the unstated duration). We do not know if we should expect early failure or wear out.
Given the range of testing conditions that provide the same number, we often are not aware of which failure mechanisms to expect. We do not know what will fail or how, unless we gather more information.
The MTBF or MTTF value is by definition an estimate of the mean and while we can calculate confidence bounds if we know a bit more about the number of samples, total time tested and number of failures, we do not have very much information about when to expect the first few failures. Using the exponential distribution, which would be a crude assumption, we may do an estimate. Yet if the units have either a early life or wear out failure pattern, the estimate of first percentile time to failure may be wildly off.
We rarely need to know when most of the items fail, rather we are interested when to expect the first 1% or 10% fail. Maybe we need more information to get a reasonable estimate.
Sure, MTBF is easy to use. Sure, MTBF is a common metric.
Yet, it’s not very useful. Use reliability (probability of failure over a stated duration and environment instead.
I don’t how long I’ve been raging about this!
It always amazes me how organisations just completely ignore the fact. I set in motion a strategy which I called Design for Excellence and which incorporated DFR, DFM, DFC etc.
I explained to the Reliability Engineers why the arithmetic mean calculation and the term MTBF were inane, but they ‘took a vote’ and decided to stick with it, ‘because using something else would just confuse matters’.
I’m still raging! It’s always good to see your posts, in the knowledge that there is at least one Reliability Engineer, besides me, who knows what it all means.
Keep the faith.
Hi Dave,
Yes, keep at it. And, took a vote…. that is really sad. Time to connect them to what they should be doing.
Cheers,
Fred