Moving toward clarity reliably
I recently saw a quote with the notion to stop complaining and do something positive. Which happens to work with my mother’s admonishment
If you have nothing good to say, say nothing.
So, while I’ve been railing against MTBF and then suggesting a better metric, my message on use something else has gotten lost. Recently on a Linkedin group discussion someone suggests yesbx.com as a sister site to nomtbf.com.
Maybe it is time to focus on a positive message around a replacement metric to MTBF. You already know my position on MTBF. So what do I recommend.
Use the simple engineering definition of reliability is a function delivered in an environment with a probability of success over a duration. Just use that. It’s is simple and complete. It is the definition of reliability and it is what we mean when talking about product or equipment reliability.
Of course there are objections.
Some are surprising.
Recently I worked with a team creating a new generation product. The goal was given to the design team as no more than one failure per product per year. According to the team, this was better than other products and given the nature of the product difficult to achieve. A one failure per year for each product means it is a goal of every product failing every year. Field service and repairs is expensive.
Wanting to place this back into my recommended metric of reliability, I used the failure rate of 1 per 8760 hours and calculated the reliability over one year (8760 hours).
The goal given to the product development team was to achieve a reliability of 36%. That seems low, yet it is a repairable system.
When I drafted the reliability plan leading with the reliability goal of 36%, the feedback was concerning the low nature of the reliability goal. The goal was the same, simply stated clearly. Only one third of production would last one year, the rest would be expected to fail. I didn’t change anything other than the method of delivery of the metric.
The ability of the team to repair the system relatively quickly keep the system availability relatively high. The availability goal was not difficult to achieve if the repair time was keep to a minimum.
So, why the reaction? In comparison it is lower than other product lines and the previous generation goal. I suspect they would not appreciate nor accept the overall low reliability of the other products, yet they accept it when the product’s reliability performance is stated as an annualized failure rate.
This incident reinforced to me that annualized failure rate, MTBF and similar metric are not well understood. Simply restating the same objective using reliability (probability of success of a duration) would raise eyebrows, cause concern, and cause questions to be asked. Well, questions should be asked. If restating the objective causes concern, maybe the original goal was incorrect. Or, was it simply incorrectly stated?