Sample Size and Duration and MTBF

14586653159_c098ab23c9_m_dSample Size and Duration and MTBF

If you have been a reliability engineer for a week or more, or worked with a reliability engineer for a day or more, someone asked about testing planning. The conversation may have started with “how many samples and how long will the test take?”

You have heard the sample size question.

Continue reading “Sample Size and Duration and MTBF”

The Relationship Between Reliability Goals and Confidence

14803836443_5a40e52835_oReliability Goal and Confidence

We establish reliability goals and measure reliability performance.

They are not the same thing. Goals and measures, while related, are not the same nor serve the same purpose. Continue reading “The Relationship Between Reliability Goals and Confidence”

The Magic Math of Meeting MTBF Requirements

Even old machines met reliability or MTBF requirementsThe Magic Math of Meeting MTBF Requirements

Recently heard from a reader of NoMTBF. She wondered about a supplier’s argument that they meet the reliability or MTBF requirements. She was right to wonder.

Estimating reliability performance a new design is difficult.

There are good and better practice to justify claims about future reliability performance. Likewise, there are just plain poor approaches, too. Plus there are approaches that should never be used.

The Vendor Calculation to Support Claim They Meet Reliability Objective

Let’s say we contract with a vendor to create a navigation system for our vehicle. The specification includes functional requirements. Also it includes form factor and a long list of other requirements. It also clearly states the reliability specification. Let’s say the unit should last with 95% probability over 10 years of use within our vehicle. We provide environmental and function requirements in detail.

The vendor first converts the 95% probability of success over 10 years into MTBF. Claiming they are ‘more familiar’ with MTBF. The ignore the requirements for probability of first month of operation success. Likewise they ignore the 5 year targeted reliability, or as they would convert, MTBF requirements.

[Note: if you were tempted to calculate the equivalent MTBF, please don’t. It’s not useful, nor relevant, and a poor practice. Suffice it to say it would be a large and meaningless number]

RED FLAG By converting the requirement into MTBF it suggests they may be making simplifying assumptions. This may permit easier use of estimation, modeling, and testing approaches.

The Vendor’s Approach to ‘Prove’ The Meet the MTBF Requirement

The vendor reported they met the reliability requirement using the following logic:

Of the 1,000 (more actually) components we selected 6 at random for accelerated life testing. We estimated the lower 60% confidence of the probability of surviving 10 years given the ALT results. Then converted the ALT results to MTBF for the part.

We then added the Mil Hdbk 217 failure rate estimate to the ALT result for each of the 6 parts.

RED FLAG This one has me wondering the rationale for adding failure rates of an ALT and a parts count prediction. It would make the failure rate higher. Maybe it was a means to add a bit of margin to cover the uncertainty? I’m not sure, do you have any idea why someone would do this? Are they assuming the ALT did not actually measure anything relevant or any specific failure mechanisms, or they used a benign stress? ALT details were not provided.

The Approach Gets Weird Here

Then we use a 217 parts count prediction along with the modified 6 component failure rates to estimate the system failure rate, and with a simple inversion estimated the MTBF. They then claimed the system design will meet the field reliability performance requirements.

RED FLAG Mil HDBK 217 F in section 3.3 states

Hence, a reliability prediction should never be assumed to represent the expected field reliability …

If you are going to use a standard, any standard, one should read it. Read to  understand when and why it is useful or not useful.

What Should the Vendor Have Done Instead?

There are a lot of ways to create a new design and meet reliability requirements.

  • The build, test, fix approach or reliability growth approach works well in many circumstances.
  • Using similar actually fielded systems failure data. It may provide a reasonable bound for an estimate of a new system. It may also limit the focus on the accelerated testing to only the novel or new or high risk areas of the new design — given much of the design is (or may be) similar to past products.
  • Using a simple reliability block diagram or fault tree analysis model to assembly the estimates, test results, engineering stress/strength analysis (all better estimation tools then parts count, in my opinion) and calculate a system reliability estimate.
  • Using a risk of failure approach with FMEA and HALT to identify the likely failure mechanisms then characterize those mechanisms to determine their time to failure distributions. If there is one or a few dominant failure mechanisms, that work would provide a reasonable estimate of the system reliability.

In all cases focus on failure mechanisms and how the time to failure distribution changes given changes in stress / environment / use conditions. Monte Carlo may provide a suitable means to analysis a great mixture of data to determine an estimate. Use reliability, probability of success over a duration.

In short, do the work to understand the design, it’s weaknesses, the time to failure behavior under different use/condition scenarios, and make justifiable assumptions only when necessary.

Summary

We engage vendors to supply custom subsystems given their expertise and ability to deliver the units we need for our vehicle. We expect them to justify they meet reliability requirements in a rationale and defendable manner. While we do not want to dictate the approach tot he design or the estimate of reliability performance, we certainly have to judge the acceptability of the claims they meet the requirements.

What do you report when a customer asks if your product will meet the reliability requirements? Add to the list of possible approaches in the comments section below.

Related

How to Calculate MTBF

Questions to ask a vendor

MTBF: According to a Component Supplier

Is Your Reliability Testing Adding Value?

14784053872_d85296bb8b_zWhy Do Reliability Testing

Reliability testing is expensive. The results are often not conclusive.

Yet we spend billions on environmental, accelerated, growth, step stress and other types of reliability tests. We bake, shake, rattle and roll prototypes and production units alike. We examine the collected data in hopes of glimpsing the future. Continue reading “Is Your Reliability Testing Adding Value?”

Lifetime Evaluation vs. Measurement. Part 2.

Lifetime Evaluation vs. Measurement. Part 2.

Guest post by Oleg Ivanov

A result of life testing can be measurement or evaluation of the lifetime.

Measurement of the lifetime requires a lot of testing to failure. The results provide us with the life (time-to-failure) distribution of the product itself. It is long and expensive.

Evaluation of the lifetime does not require as many test samples and these tests can be without failures. It is faster and cheaper [1]. A drawback of the evaluation is that it does not give us the lifetime distribution. The evaluation checks the lower bound of reliability only, and interpretation of the results depends on the method of evaluation (the number of samples, test conditions, and the test time). Continue reading “Lifetime Evaluation vs. Measurement. Part 2.”

Time to Update Our Standards

14598646597_9c7d086e1d_zTime to Update Our Standards

Not our personal or moral standards, rather the set of documents we rely upon as a foundation for reliability engineering tools and techniques.

We have a wide array of standards for reporting reliability test data to calculating confidence intervals on field returns. We have standards that describe various environmental conditions and appropriate testing levels suitable to evaluate your product. We define terms, concepts, processes, and techniques.

A Missing Element

Despite the many documents and impressive titles of numbers and abbreviations or acronyms, most of the standard related to reliability engineer fail to include sufficient context and rationale concerning when and why to use or modify the standard. If a specific test is to determine the expected lifetime of solder joints, well, which type of solder joints (shape, size, configuration, material, and process) is the standard appropriate and when does it not apply? Make the boundaries of applicability clear.

No single test works for all situations.

For example, a wrist watch standard defining how to test for specific water resistance claims does not evaluate the effects of corrosion. The standard has the watch or similar device exposed to a set of water conditions, then evaluate if the system is operating, nearly immediately after the water exposure.

We know that water encourages corrosion, yet takes time to occur. Water alone on a circuit board is no big deal (much of the time) it’s when the water facilitates the creation of additional and unwanted current paths that there is a problem. Metal migration and rusting, take time to occur.

If the standard for water resistance doesn’t evaluate corrosion, and it’s one of the ways your product fails, too bad. You can ‘pass’ the test, meet the standard, add it to your data sheet, and the customer will still experience a failure.

Same for many environmental testing, FMEA, life testing, field data analysis, and a range of other standards. They do not include the critical information necessary for appropriate application of the standard to your particular situation.

Connection to Value

Many, not all, standards provide a recipe to accomplish as task or evaluation. One of the values of the standard is different teams may replicate the results of one team by repeating the steps outlined in the standard.

One of issues with standards is they do not include how and why to actually accomplish the set of tasks and what to do with the results. In part, we need to clearly connect, say the task of testing a product across a range of temperature and humidity conditions, only if it will provide meaningful information.

Don’t run the test if the information is not needed, unnecessary or meaningless.

For example, if we expect that exposure to high temperature and humid conditions may increase the chance of product failure. We may want to know

  • how many failures will occur;
  • how the product will actually fail;
  • how the failure will initiate and progress;
  • when the failures occur under use conditions;

Or any number of reasons to use the results of the testing. Often we run a standard test with very few samples, experience no failures and erroneously conclude all it good. Then surprised that failures occur anyway when the product is in use.

The standard let us down.

The standard provided only a recipe or outline for a procedure and now that guidance and rationale on how it may or may not help us and our team resolve very real questions. Testing 3 units that all pass does not mean your solar panel will survive hot and humid conditions for 20 years with no failures. It doesn’t.

Only run the test or work to accomplish a process only if it is tied to answering a question. Focus on business decisions and the questions we have to resolve in order to make better decisions (i.e. Wrong less often).

Summary

Let’s change the way we read and use standards. You may need to add the how and why, the boundaries, and the connection to value for your situation. It’s not always easy. The people writing the standard often have sufficient experience to include guidelines to help you — when possible contact them and ask what was their thinking and what are the limitations.

If enough of us avoid simply meeting the requirements of the standard, we will

  • Enjoy reliable product performance
  • Create value to our organization with each test or task
  • And, eventually change how standards are written

Thoughts on Testing One Sample and No Failures

 

14598506379_df6e4e22e0_zReliability Testing with Constraints

In some cases we have to conduct testing and are asked to not break the product. Now, that isn’t all that fun as a reliability engineer. We want to find what fails and understand it. Or, we want to confirm what we expect will fail, actually does as expected.

So, what do we do when confronted with a very small sample size (that is one issue) and are expected to conduct failure free testing (second issue)? Let’s explore each issue separately and come up with a few suggestions on how to proceed.

Thanks to Олег (@OlegV_Ivanov) via Twitter for the article suggestion. Thanks for the idea  Олеr. Continue reading “Thoughts on Testing One Sample and No Failures”

Is Reliability Just Testing?

Is Reliability Just Testing?

14597483807_4d45aa3e9e_oI endured a difficult conversation with a project manager yesterday. The meeting agenda included an initial discussion about the product development reliability plan. She agreed that we needed to identify risks and provide feedback to the team concerning product reliability. Continue reading “Is Reliability Just Testing?”

The Convenient Use of MTBF

14597288639_27e0622088_zThe Convenient Use of MTBF

Sometimes making an assumption is a good thing. You can achieve more with less. A well placed assumption saves you time, work, and worry. The right assumption may even be left unstated, it’s so good.

Have you ever assumed the failures for a system follow an exponential distribution? Did you assume tallying up the total hours and dividing by the number of failures was appropriate? Did you even check? (You don’t need to answer.) Continue reading “The Convenient Use of MTBF”

Why Doesn’t Product Testing Catch Everything?

Why Doesn’t Product Testing Catch Everything?

photolibrarian West Union, Iowa, The Reliable Agency, B. Kamm, Jr., Matchbook, Farmers Casualty Company https://www.flickr.com/photos/photolibrarian/8244857538/in/gallery-fms95032-72157649635411636/
photolibrarian
West Union, Iowa, The Reliable Agency, B. Kamm, Jr., Matchbook, Farmers Casualty Company

In an ideal world the design of a product or system will have perfect knowledge of all the risks and failure mechanisms. The design then is built perfectly without any errors or unexpected variation and will simply function as expected for the customer.

Wouldn’t that be nice.

The assumption that we have perfect knowledge is the kicker though, along with perfect manufacturing and materials. We often do not know enough about:

  • Customer requirements
  • Operating environment
  • Frequency of use
  • Impact of design tradeoffs
  • Material variability
  • Process variability

We do know that we do not know everything we need to create a perfect product, thus we conduct experiments.

We test. Continue reading “Why Doesn’t Product Testing Catch Everything?”

Designing an ORT

I received a question about setting up an ORT the other day. Below is my response.

There is not a hard and fast rule for how much life to take out of a product during ORT and still be able to sell the unit.

There are two different reasons to run ORT and each may take a different approach. Continue reading “Designing an ORT”

Sample size and MTBF

Samples for Testing

Normally, we life test a sample of products in order to make sure the products will last as long as expected. We assume that the sample we select will represent the total population of products that we eventually ship. It is not a perfect system, and there is some risk involved. Continue reading “Sample size and MTBF”

Why success with HALT begins long before doing HALT

HALT is a BIG change

Implementing a new reliability development paradigm in a company which is using traditional, standards-based testing can be a perilous journey.  It is especially true with introducing HALT (Highly Accelerated Life Test) in which strength against stress, and not quantifying electronics lifetimes is the new metric.  Because of this significant change in test orientation, a critical factor for success begins with educating the company’s top Continue reading “Why success with HALT begins long before doing HALT”

Question use of reliability testing standards

Each of us have seen product life or component reliability claims on product literature or data sheets. We may even have received such claims stated as goals and been asked to support the claim with some form of an experiment. Standards bodies from ANSI, BSI, ISO, IEC, and others from around the world provide standard methods for testing products. This includes product life testing in some cases. Continue reading “Question use of reliability testing standards”

Why HALT is a methodology, not equipment

Kirk Gray, Accelerated Reliability Solutions, L.L.C.

It is easy to understand why the term HALT (Highly Accelerated Life Test) is so tightly couple to the equipment called “HALT chambers” systems.  Many do not think they can do HALT processes without a “HALT Chamber”. Many know that Dr. Gregg Hobbs, who coined the term HALT and also HASS (Highly Accelerated Stress Screens), spent much of his life promoting the techniques and was also the founder of two “HALT/HASS” environmental chamber companies. Continue reading “Why HALT is a methodology, not equipment”