MTBF is not reliability. Attaining a specific MTBF does not mean your product is reliable. MTBF use may be the culprit.
Therefore, working to achieve a MTBF value may actually be preventing you from creating a product that mets your customer’s reliability performance expectations.
Actively working to achieve MTBF using the common tools around MTBF may be taking you and your team down the wrong rabbit hole. You may be working to reduce the reliability of your products rather than improving them.
Let’s take a look at a couple of ways the pursuit of MTBF is harmful to your product’s reliability potential and contrary to your customer’s expectations.
When the Requested MTBF is Not What The Customer Wants
Some customers may request 50,000 hours MTBF when they really want a very low failure rate probability over 50,000 hours of product use. They meant a duration of 50k hours, not a chance of failing every hour of 1:50k. They didn’t know how to ask for what they wanted. You should ask any time someone asks for MTBF what they really want.
What is a customer really wants high availability over a short duration, or they want to reduce repair times, or they cannot tolerate any failures over a 12 hour mission duration? If they only ask for some MTBF value, is that sufficient for you to create a product that will meet their needs. Probably not.
Reliability Testing Assuming MTBF
When we assume a constant hazard rate, which is common when using MTBF, we can use the memoryless feature of the exponential distribution. Therefore we can test 50 units for 1,000 hours each and count the number of failures that occur… if only one or none, we have ‘proved’ the 50,000 hour MTBF. All good.
The may catch early life failures complicating the test analysis and results, yet certainly would not reflect any product actual reliability performance over a duration of 50k hours. We actually learn very little about how the product performs after 1,000 hours.
The sad part is products with known wear out failure mechanisms are tested using these methods, thus avoiding the messy business of wear out failures clouding reliability testing results.
Assuming Away Early Life and Wear Out Issues
If customer would just use the product during the ‘useful life’ portion of the bathtub curve. Draw as a low failure rate over an extended duration with a few early life failures for a short duration, plus eventually something wears out long after the product has been retired.
Customers do not control the ‘useful life’. The product design does, with a dash of manufacturing, too. Design and build a reliable product, and it may have a low failure rate over the duration a customer may want to use the item.
If we assume away the the early life and wear out portions in order to focus on the useful life, we have a couple of problems:
First, we’re delusional in thinking there is a flat part of the curve that we can assume will naturally occur.
Second, our assumptions do not change what actually causes the product fail. We still have design and manufacturing issues that cause failures. Some occur early, some later, rarely at a low and random rate.
Third, customers do not care about your assumptions, it is the actual performance that matters.
Use Reliability Instead of MTBF
One way out this nest of problems is to avoid using MTBF. Instead use reliability, or the probability of successful operation, over a specified duration. Include the details on the environment, use profile, and what we consider a failure, and you are making progress.
Using MTBF makes everything easier. From apportionment, test planning, and design we can simply assume away many problems that will cause the product to fail. The problem is products that are not reliable fail.
Does your team use MTBF (or MTTF) and do you regularly have ‘surprise’ field failures? If you use reliability directly could you have avoided the issue? I suspect so. What is your story?