The value of any task is in the result, although a few may argue about the value in the journey itself. For reliability engineering tasks, the ability to use the output of an experiment or analysis to make decisions permits an organization to derive value from the tasks. Despite a few organizations that insist on repeating a fixed set of meaningless standard tests and promptly ignoring any adverse results, the majority of us want to learn something with product testing.
This applies to risk analysis, vendor evaluations, life predictions, failure analysis, and field data analysis. The intent is to identify and use information that permits us to improve product reliability in a cost effective manner. Its not to do some random assortment of reliability activities and believe that “we’re doing reliability”. The attempt to invest in specific reliability activities is what directly leads to reducing risks and improving reliability performance.
This brings me to MTBF. Why are we using this measure as a means to describe product reliability? Maybe we do everything else very well. We identify risks early, we characterize failure mechanisms, we understand the range of use conditions, but then we wrap the total sum of our understanding into one useless number. Ugh!
Real Life Recovery of Value
I’ve seen this happen within a billion dollar product organization. The development team used very sophisticated analysis, modeling and tools to fully characterize the expected lifetime of their products. It was impressive. Then they would summarize the results in a graph (like a Weibull cumulative distribution function) and send it to the finance team to determine the appropriate funds to set aside for warranty accruals. This would range in the tens of millions of dollars. The warranty accrual estimate and the difference from the realized warranty expenses led to significant variation in the ability to the organization to estimate earnings and “hit their numbers.”
Despite the wonderful analysis and careful experimentation and complex modeling, the finance team simplified the reliability estimates to a single number: MTBF. The product was typically under warranty for 3 months or a year, depending on the product and market. Neither point in time, however, was part of the finance team’s calculation, despite having a very accurate estimate for the expected failure rate. They used the MTBF, which often ranged in the 5 to 7 year time frame (in 6 years 2/3rds of the units were expected to fail). Using the assumption consistent with using MTBF for a constant failure rate, the finance team grossly overestimated the number expected failures, regularly. This of course lead to a “correction factor” to bring the warranty accrual estimate closer to actuals. Instead, if they had used the excellent information provided, they would not have required an artificial adjustment and for each product achieved a remarkably accurate estimate of field returns.
The information from the finance team influenced major business decisions. After we corrected the team’s understanding and use of the reliability estimates, we found that previous errors based on the use of MTBF tallied into the tens of millions on a quarterly basis. The adjustments were an attempt to minimize the errors. I suspect there is a joke here about MBAs and spreadsheets.
The value of the millions of dollars of product development and reliability modeling was completely wasted by a misunderstanding concerning the interpretation of the results. In an organization the focus should be on creating and providing customers with reliable products (from a reliability engineer’s point of view, anyway). The ability to connect the task to the value, and then see the value all the way though the organization is often key to achieving reliable products and the value.
What have you seen gone wrong with the use of MTBF?