Value and MTBF
The value of any task is in the result, although a few may argue about the value in the journey itself. For reliability engineering tasks, the ability to use the output of an experiment or analysis to make decisions permits an organization to derive value from the tasks. Despite a few organizations that insist on repeating a fixed set of meaningless standard tests and promptly ignoring any adverse results, the majority of us want to learn something with product testing.
Risk Analysis
This applies to risk analysis, vendor evaluations, life predictions, failure analysis, and field data analysis. The intent is to identify and use information that permits us to improve product reliability in a cost effective manner. Its not to do some random assortment of reliability activities and believe that “we’re doing reliability”. The attempt to invest in specific reliability activities is what directly leads to reducing risks and improving reliability performance.
MTBF
This brings me to MTBF. Why are we using this measure as a means to describe product reliability? Maybe we do everything else very well. We identify risks early, we characterize failure mechanisms, we understand the range of use conditions, but then we wrap the total sum of our understanding into one useless number. Ugh!
Real Life Recovery of Value
I’ve seen this happen within a billion dollar product organization. The development team used very sophisticated analysis, modeling and tools to fully characterize the expected lifetime of their products. It was impressive. Then they would summarize the results in a graph (like a Weibull cumulative distribution function) and send it to the finance team to determine the appropriate funds to set aside for warranty accruals. This would range in the tens of millions of dollars. The warranty accrual estimate and the difference from the realized warranty expenses led to significant variation in the ability to the organization to estimate earnings and “hit their numbers.”
Despite the wonderful analysis and careful experimentation and complex modeling, the finance team simplified the reliability estimates to a single number: MTBF. The product was typically under warranty for 3 months or a year, depending on the product and market. Neither point in time, however, was part of the finance team’s calculation, despite having a very accurate estimate for the expected failure rate. They used the MTBF, which often ranged in the 5 to 7 year time frame (in 6 years 2/3rds of the units were expected to fail). Using the assumption consistent with using MTBF for a constant failure rate, the finance team grossly overestimated the number expected failures, regularly. This of course lead to a “correction factor” to bring the warranty accrual estimate closer to actuals. Instead, if they had used the excellent information provided, they would not have required an artificial adjustment and for each product achieved a remarkably accurate estimate of field returns.
The information from the finance team influenced major business decisions. After we corrected the team’s understanding and use of the reliability estimates, we found that previous errors based on the use of MTBF tallied into the tens of millions on a quarterly basis. The adjustments were an attempt to minimize the errors. I suspect there is a joke here about MBAs and spreadsheets.
The value of the millions of dollars of product development and reliability modeling was completely wasted by a misunderstanding concerning the interpretation of the results. In an organization the focus should be on creating and providing customers with reliable products (from a reliability engineer’s point of view, anyway). The ability to connect the task to the value, and then see the value all the way though the organization is often key to achieving reliable products and the value.
What have you seen gone wrong with the use of MTBF?
Having just joined the No MTBF group I wanted to outline why I do not use MTBF as a reliability indicator. Most of the mechatronics I have been associated with were low volume product. The accumulation of product reliability data during use cannot influence the design life cycle. This data may be able to affect the manufacturing phase but in general it will be too late for most of the product and in reality is unlikely to influence assembly methods because of cost reduction and expensive processes.
Shock, vibration and thermal effects are the major causes of unreliability. There are many more causes including electric fields and their sources, biological sources, chemical sources and electromechanical sources within the crystal structures of materials.
By applying environmental stimulation during the design and development phases it is possible to highlight the key sources of unreliability and use the results to remove these defects. The use of a database comprising historical events covering a wide range of design and manufacturing sources is quite frankly, laughable.
The techniques I promoted in organisations such as Nokia, Cobham (Flight Refuelling), MBDA and Claverham is based on the use of environmental stimulation and simulation in conjunction with understanding the physics of failure and establishing root cause effects. Looking for performance trends under the influence of environmental stimuli during the early life-cycle phases will have a far greater effect on in-use product reliability than any study carried out by the use of a database.
Whilst I managed to produce a measure of reliability based on expected use the marketeers were blinkered by trying to convert this measure to an MTBF!! Given time I reckon I could have persuaded customers as to the efficacy of Physics of Failure and the benefit of ‘stress modelling’.
Hi Brian,
Welcome to the group and keep up the good work.
cheers,
Fred