# Category Archives: Predictions

From engineering guesses, to parts count tallies, to standard testing, including accelerated life testing, and field data analysis — we often attempt to predict reliability.

# Reliability Growth and MTBF

Really? Is MTBF the only way to work with reliability growth?

Received this question via LinkedIn (feel free to connect with me there) and hadn’t given it much thought before. I am familiar with a few growth models and regularly have seen MTBF in use. Thus discounted the modeling as an approach of little interest to me or my clients.

MTBF measures the inverse of the average failure rate, when in many cases we really want to know about the first or tenth percentile of time to failure. Measuring and tracking the average time to failure provides little information about the onset of the first few failures.

## Reliability Growth Models

Did just a quick check of common reliability growth models and found a few in the NIST Engineering Statistics Handbook  http://www.itl.nist.gov/div898/handbook/apr/section1/apr19.htm .

The Homogeneous Poisson Process (HPP) when the failure rate is constant over the time period of interest. This relies on the exponential distribution and the assumption of a stable and random arrival of failures, which is almost always not true (in my experience). It’s a convenient assumption as it makes the math a lot simpler, yet provides only a crude model and poor results.

The Non-Homogeneous Poisson process (NHPP) Power Law and Exponential Law models provide information based on the cumulative number of failures over time. These models rely on the notion that any system has a finite number of design errors that once resolved create a system that has a HPP behavior.

Duane Plot provides a graphical means to show cumulative failures over time. When the arrival of failures slows the curve decreases in slope effectively bending over. This provides a means to estimate the final failure rate (average unfortunately).

Given my dislike of all things MTBF, I’ve not used these model to estimate MTBF. Instead stay with the Duane plot and graphically track when the team is finding and fixing enough faults in the design.

I also tend to use reliability block diagrams (RBD) with each block modeled with the appropriate reliability distribution. For a series model then all we need to do is multiple the reliability value from each block for time t (say warranty period, or mission time, etc.) to estimate the system reliability at time t.

For complex systems with some amount of redundancy the RBD does get a bit more complicated.  For very complex systems with degraded modes of operation or significant repair times then use Petri Nets or Markov Models to properly model.

In the vast majority of cases a simple RBD is sufficient to capture and understand the reliability of a system. This allows the team to focus on improving weak areas and reduce uncertainty though improving reliability estimates. An RBD does not require nor assume an exponential distribution and the math is easy enough to manage, often even in your favorite spreadsheet.

## Summary

Reliability growth starts with model of the estimated number of failures over a time period. Testing then provides a value for that estimate. This does not require the use of MTBF, so instead of assuming a constant failure rate, focus on the failure mechanisms and use a simple RBD to build a system model. The reliability growth is the result of identifying areas for improvement and doing the improvement. RBD, in my experience, provides a great way to communicate with the team where to focus improvements.

# How to Estimate Reliability Early in a Program

In a few discussions about the perils of MTBF, individuals have asked about estimating MTBF (reliability) early in a program. They quickly referred to various parts count prediction methods as the only viable means to estimate MTBF.

One motivation to create reliability estimates is to provide feedback to the team. The reliability goal exists and the early design work is progressing, so estimating the performance of the product’s functions is natural. The mechanical engineers may use finite element analysis to estimate responses of the structure to various loads. Electrical engineers may use SPICE models for circuit analysis.

Customers expect a reliable product. If they are investing in the development of the product (military vehicle, custom production equipment, or solar power plant, for examples) they may also want an early estimate of reliability performance.

Engineers and scientists estimate reliability during the concept phase as they determine the architecture, materials, and major components. The emphasis is often on creating a concept that will deliver the features in the expected environment. The primary method for reliability estimation is engineering judgement.

With the first set of designs, there is more information available on specific material, structures, and components, thus it should be possible to create an improved reliability estimate.

## Is testing the true way to estimate MTBF?

Early in a program means there are no prototypes available for testing, just bill of materials and drawings. So, what is a reliability engineer to do?

One could argue that without prototypes or production units available for testing (exercising or aging the system to simulate use conditions) we do not really know how the system will respond to use conditions. While it is true it is difficult to know what we do not know, we often do know quite a bit about the system and the major elements and how they individually will respond to use conditions.

Even with testing, we often use engineering judgement to focus the stresses employed to age a system. We apply prior knowledge of failure mechanism models to design accelerated tests. And, we use FMEA tools to define the areas most likely to fail, thus guiding our test development.

## Creating a reliability estimate without a prototype

Engineering judgement is the starting point. Include the information from FMEA and other risk assessment methods to identify the elements of a product that are most likely to fail, thus limit the system reliability. Then there are a few options available to estimate reliability, even without a prototype.
First, it is rare to create a new product using all new materials, assembly methods, and components.

Often a new product is approximately 80% the same as previous or similar products. The new design may be a new form factor, thus mostly a structural change. It may includes new electronic elements – often just one or two components, where the remaining components in the circuit regularly used. Or, it may involve a new material, reusing known structures and circuits.

Use the field history of similar products or subsystems and engineering judgement for the new elements to create an estimate. A simple reliability block diagram may be helpful to organize the information from various sources.

For the new elements of a design, base the engineering judgement on analysis of the potential failure mechanisms, employ any existing reliability models, or use simulations to compare known similar solutions to the new solution.

Second, for the elements without existing similar solutions and without existing failure mechanism models, we would have to rely on engineering judgement or component or test coupon level testing. Rather than wait for the system prototypes, early in a program it is often possible to obtain samples of the materials, structures, or components for evaluation.

The idea is to use our engineering judgement and risk analysis tools to define the most likely failure mechanisms for the elements with unknown reliability performance. Let’s say we are exploring a new surface finish technique. We estimate that exposure to solar radiation may degrade the finish. Therefore, obtain some small swatches of material, apply the surface finish and expose to UV radiation. While not the full product using fully developed production processes, it is a way to evaluate the concept.

Another example, is a new solder joint attachment technique. Again, use your judgement and risk analysis tools to estimate the primary failure mechanisms, say thermal cycling and power cycling, then obtain test packages with same physical structures (the IC or active elements do not have to be functional) and design appropriate tests for the suspected failure mechanisms.

## Estimate combine the available knowledge

With a little creativity we can provide a range of estimates for elements of a design that have little or no field history. We do not need to rely on a tabulate list of failure rates for dissimilar product created by a wide range of teams for diverse solutions. We can draw from our team’s prior designs actual field performance for the bulk of the estimate. Then fill in the remaining elements of the estimate with engineering judgement, comparative analysis, published reliability models, or coupon or test structure failure mechanism evaluations.

In general, we will understand the bulk of the reliability performance and have rational estimates for the rest. It’s an estimate and the exercise will help us and the team focus on which areas may require extensive testing.

# Solving Type III problems

There are occasions when we perfectly solve the wrong problem. This is a Type III error.

Following the statistics idea of Type I and Type II errors, when a sample provides information incorrectly about a population, Type III is the error of asking the wrong question to start.

Solving the wrong problem, even perfectly, is still an error.

# Who are you fooling with MTBF Predictions?

All models are wrong, some are useful. ~ George E. P. Box

If you know me, you know I do not like MTBF. Trying to predict MTBF, which I consider a worthless metric, is folly.

So, why the article on predicting MTBF?

Predicting MTBF or creating an estimate is often requested by your customer or organization. You are being specifically asked for MTBF for a new product.

You have to come up with something.

# Calculating reliability from data

In the last note, we calculated MTBF using some test data. Now let’s start with the same situation and calculate reliability instead. As before: There are occasions when we have either field or test data that includes the duration of operation and whether or not the unit failed.

# Finding and eliminating early life failures

MTBF for electronics life entitlement measurements is a meaningless term. It says nothing about the distribution of failures or the cause of failures and is only valid for a constant failure rate, which almost never occurs in the real world. It is a term that should be eliminated along with reliability predictions of electronics systems with no moving parts. Continue reading Eliminating early life failures

# Electronics Failure Prediction Methodology does not work

Posted 12-11-2012 by Kirk Gray,

Accelerated Reliability Solutions, L.L.C.

“When the number of factors coming into play in a phenomenological complex is too large, scientific method in most cases fails.  One need only think of the weather, in which case the prediction even for a few days ahead is impossible.” ― Albert Einstein

“Prediction is very difficult, especially about the future.” – Niels Bohr* We have always had a quest to reduce future uncertainties and know what is going to happen to us, how long we will live, and what may impact our lives.  Horoscopes, Tarot

# Where does 0.7eV come from

This post is a conversation first held on the LinkedIn group No MTBF. I’m capturing a portion of the contributions here to continue the discussion or to widen the audience. Reminds me of always assuming 95% confidence is the right value when designing a test, or assuming constant failure rate. So, let the conversation continue, starting with the original post. Continue reading Where does 0.7eV come from

# What’s All the Fuss about Bayesian Reliability Analysis?

The term Bayesian Reliability Analysis is popping up more and more frequently in the reliability and risk world.  Most veteran reliability engineers just roll their eyes at the term.  Most new reliability engineers dread the thought of having to learn something else new, just when they are getting settled in the job.  Regardless, it is a really good idea for all reliability engineers to have a basic understanding of Bayesian Reliability Analysis.

This series explains Bayesian Reliability Analysis and justifies Continue reading What’s All the Fuss about Bayesian Reliability Analysis?

# Role of parts count prediction

Great note [response to comment on Drain in the Bathtub Curve on NoMTBF Linkedin Group] – yes, there is a place for parts count prediction — not to determine the mtbf, to encourage proper derating, thermal engineering, and parts reduction, etc. It’s a start and as you note only one part of the reliability program. Continue reading Role of parts count prediction

# Why The Drain in the Bathtub Curve Matters

Most reliability engineers are familiar with the life cycle bathtub curve, the shape of the hazard rate or risks of failure of a electronic product over time. A typical electronic’s life cycle bathtub curve is shown in figure 1. Continue reading Why The Drain in the Bathtub Curve Matters

# Parts count variation

Just a short post to point to a newly added paper to the reference section. A few years ago I recalled seeing a paper that studied the difference to expect between various parts count methods and actual results. Continue reading Parts count variation

# No Evidence of Correlation: Field failures and Traditional Reliability Engineering

Historically Reliability Engineering of Electronics has been dominated by the belief that 1) The life or percentage of complex hardware failures that occurs over time can be estimated, predicted, or modeled and 2) Reliability of electronic systems can be calculated or estimated through statistical and probabilistic methods to improve hardware reliability.  The amazing thing about this is that during the many decades that reliability Continue reading No Evidence of Correlation: Field failures and Traditional Reliability Engineering