The Magic Math of Meeting MTBF Requirements
Recently heard from a reader of NoMTBF. She wondered about a supplier’s argument that they meet the reliability or MTBF requirements. She was right to wonder.
Estimating reliability performance a new design is difficult.
There are good and better practice to justify claims about future reliability performance. Likewise, there are just plain poor approaches, too. Plus there are approaches that should never be used.
The Vendor Calculation to Support Claim They Meet Reliability Objective
Let’s say we contract with a vendor to create a navigation system for our vehicle. The specification includes functional requirements. Also it includes form factor and a long list of other requirements. It also clearly states the reliability specification. Let’s say the unit should last with 95% probability over 10 years of use within our vehicle. We provide environmental and function requirements in detail.
The vendor first converts the 95% probability of success over 10 years into MTBF. Claiming they are ‘more familiar’ with MTBF. The ignore the requirements for probability of first month of operation success. Likewise they ignore the 5 year targeted reliability, or as they would convert, MTBF requirements.
[Note: if you were tempted to calculate the equivalent MTBF, please don’t. It’s not useful, nor relevant, and a poor practice. Suffice it to say it would be a large and meaningless number]
RED FLAG By converting the requirement into MTBF it suggests they may be making simplifying assumptions. This may permit easier use of estimation, modeling, and testing approaches.
The Vendor’s Approach to ‘Prove’ The Meet the MTBF Requirement
The vendor reported they met the reliability requirement using the following logic:
Of the 1,000 (more actually) components we selected 6 at random for accelerated life testing. We estimated the lower 60% confidence of the probability of surviving 10 years given the ALT results. Then converted the ALT results to MTBF for the part.
We then added the Mil Hdbk 217 failure rate estimate to the ALT result for each of the 6 parts.
RED FLAG This one has me wondering the rationale for adding failure rates of an ALT and a parts count prediction. It would make the failure rate higher. Maybe it was a means to add a bit of margin to cover the uncertainty? I’m not sure, do you have any idea why someone would do this? Are they assuming the ALT did not actually measure anything relevant or any specific failure mechanisms, or they used a benign stress? ALT details were not provided.
The Approach Gets Weird Here
Then we use a 217 parts count prediction along with the modified 6 component failure rates to estimate the system failure rate, and with a simple inversion estimated the MTBF. They then claimed the system design will meet the field reliability performance requirements.
RED FLAG Mil HDBK 217 F in section 3.3 states
Hence, a reliability prediction should never be assumed to represent the expected field reliability …
If you are going to use a standard, any standard, one should read it. Read to understand when and why it is useful or not useful.
What Should the Vendor Have Done Instead?
There are a lot of ways to create a new design and meet reliability requirements.
- The build, test, fix approach or reliability growth approach works well in many circumstances.
- Using similar actually fielded systems failure data. It may provide a reasonable bound for an estimate of a new system. It may also limit the focus on the accelerated testing to only the novel or new or high risk areas of the new design — given much of the design is (or may be) similar to past products.
- Using a simple reliability block diagram or fault tree analysis model to assembly the estimates, test results, engineering stress/strength analysis (all better estimation tools then parts count, in my opinion) and calculate a system reliability estimate.
- Using a risk of failure approach with FMEA and HALT to identify the likely failure mechanisms then characterize those mechanisms to determine their time to failure distributions. If there is one or a few dominant failure mechanisms, that work would provide a reasonable estimate of the system reliability.
In all cases focus on failure mechanisms and how the time to failure distribution changes given changes in stress / environment / use conditions. Monte Carlo may provide a suitable means to analysis a great mixture of data to determine an estimate. Use reliability, probability of success over a duration.
In short, do the work to understand the design, it’s weaknesses, the time to failure behavior under different use/condition scenarios, and make justifiable assumptions only when necessary.
We engage vendors to supply custom subsystems given their expertise and ability to deliver the units we need for our vehicle. We expect them to justify they meet reliability requirements in a rationale and defendable manner. While we do not want to dictate the approach tot he design or the estimate of reliability performance, we certainly have to judge the acceptability of the claims they meet the requirements.
What do you report when a customer asks if your product will meet the reliability requirements? Add to the list of possible approaches in the comments section below.