Learn about MTBF
One of the advantages of learning about MTBF is the same understanding applies to the many variations. Rather than engage in endless debates over what is or is not counted – shift the conversation to how to use the information for decisions. What is in service to the decision?
The Perils page provides some information about MTBF and the web provides a wealth of information about this common metric. A basic understanding of this inverse of a failure rate goes a long way to determine if the data is being well represented.
Alternative Metrics
MTBF is often used to represent product life. It is not complete nor sufficient. Product life or reliability has four elements: function, probability, duration and environment. MTBF is only the probability and assumes (in most cases) the duration does not matter, or worse is not even stated.
As an alternative, use reliability directly. State the probability of success over a specified time frame, along with the functions (leads to understanding of product failure definition) and environment. The function and environment are often abbreviated, i.e. a respirator provides life support breathing in North American intensive care facilities. The details of the functions and environment are often well stated in product development and marketing documents.
The probability and duration may include multiple statements. One for important elements of the product life. For example, since products that failure during first use damage the product brand significantly, we may want to have a very high probability of success during the first 3 months of product use. Say, 99.99% reliability over first 3 months of use.
The warranty period may be another duration of interest. 98% reliability over the 1 year warranty period. And, the design life (how long the product should last and provide value to the customer) might be stated as 90% reliability over 5 years.
The early failures focus on component, assembly, shipping and installation sources of product failure. The warrant period and reliability is of interest as a business liability. The design life focuses on the longer term failure mechanisms.
Therefore, move away from a partial statement concerning product reliability. Make full use of clear statements of expectations (goals) and measures.
Teach Others
This one is easy – with your understanding of MTBF and the various pitfalls and misunderstandings. Do not permit those around you (peers, management, vendors, suppliers, and customers) continue to misunderstand and use incorrectly MTBF. Verify understanding and convey the actual meaning of the term.
Ask Questions
Besides asking about understanding and definitions, ask about the use of reliability statements.
- Where did this come from?
- How and when is this measure?
- What decisions does this metric support?
- How does the data support this measure?
Challenge Assumptions
There are two principle assumptions made related to MTBF – both require your challenge.
First, what is the evidence that the underlying time to failure data supports the use of the exponential distribution? Or, the concept of a constant failure rate over the product life?
And, if the data supports the assumption you still benefit by using reliability as the metric, rather than MTBF. As it avoids the common misunderstandings surrounding the metric.
Second, challenge the assumption based on ‘this is what our industry and customers always use’. A full statement of reliability can always be converted to an MTBF statement if required. To make decisions use the data and appropriate and accurate summaries and measures. To assist your vendors and customers fully understand the reliability requirements or claims, use a complete reliability statement.
In conclusion
Years ago while conducting factory assessments we often asked about and inspected a suppliers application of statistical process control (SPC). Often these programs were little more than a show of a few very poorly management and applied SPC charts that resulted in no process improvements. One factory even used a locked display cabinet to display the charts, convenient for customer inspection. Two years later the same charts were still on display. Worthless to them and us.
The point is MTBF and similar measures can have real value across the entire supply chain and product life cycle. Only when the measure accurately describes the data, if well or easily understood, and permits appropriate assessment of risks and tradeoffs. MTBF often fails these simple criteria.
What can you do? Do not use MTBF.
Hello Fred,
I had a question regarding the use of MTBF.
In the useful life phase of the product (from the bathtub curve), the failure rate remains constant and the product experiences an exponential distribution of hazard rate. And most of the calculation models assume this, when they provide the results. And as far as I understand from this group’s discussion is that MTBF is useful when the failure data follows the exponential distribution (if wrong, correct me on this). Also the Reliability is calculated using this MTBF/FR for the mission time required.
Most of the customers are interested in to know, how their product behaves in the useful life phase, right? So then, cant we use MTBF as a reference indicator in this case?
Regards,
Anup Hegde
Hi Anup,
Before you assume your system or component really has a constant hazard rate – get some data and check. In my experience, it is very rare that you will actually find a constant hazard rate – the bathtub curve in textbooks is used to explain different patterns of changing failure rates – it does not actually represent any actual data. The bathtub curve is fiction.
Most customers want to know if your product will work for them over the duration they are interested in having it work – They would prefer no failures, yet understand they may occur. They would rather understand the actual expected pattern of failure rates over time, not some vague and very useless average.
If you ask a customer what they want when asking for MTBF values, it may reveal they believe that MTBF is a failure-free period or some other misunderstanding of what it really means.
Sure assuming a constant hazard rate makes life simple and one could use the exponential distribution, yet it rarely is useful, accurate, or helpful to do so. I advise you to first understand the failure mechansims, get the data and understanding the failure rate pattern over time, then clearly represent the probability of failure over time.
Cheers,
Fred