Reliability Testing with Constraints
In some cases we have to conduct testing and are asked to not break the product. Now, that isn’t all that fun as a reliability engineer. We want to find what fails and understand it. Or, we want to confirm what we expect will fail, actually does as expected.
So, what do we do when confronted with a very small sample size (that is one issue) and are expected to conduct failure free testing (second issue)? Let’s explore each issue separately and come up with a few suggestions on how to proceed.
Thanks to Олег (@OlegV_Ivanov) via Twitter for the article suggestion. Thanks for the idea Олеr.
Sample Size Limitation
When doing product or reliability testing the first step is to be clear about the objective of the testing. If the reliability testing is to clearly prove a 98% reliability over 5 years with at least a 90% statistical confidence, then we have a problem. A sample size of one or two samples will not be sufficient to demonstrate the reliability performance with the statistical confidence.
On the other hand, if we want to check the functionality of the system integration, and not make claims about the population of future builds, then we’re fine. Not all testing has the same objective and understanding how constraints limit what is possible via testing, the better you can craft and achieve meaningful objectives.
HALT is a great tool to discover limitations or weaknesses of a design or assembly. And, it can be accomplished with just one or two samples. So, if the objective is to determine the margins, weaknesses, etc. Then HALT may well be the approach made for the situation.
A good idea even if not limited by sample size is to fully model the expected failure mechanisms and the system reliability. Using the best available information to populate the model, plus experimental results on elements of the system, allows the estimate of system reliability performance.
Furthermore a solid reliability model sets up the testing well. It allows us to focus on what is likely to fail or change over time, thus allowing testing a single unit with the purpose of confirming the model. Ideally run the testing to failure to confirm the failure mechanism models. I’m thinking of the Boeing 777 wing testing as an example.
Subsystem testing is another approach when facing limited system level samples. If you know the compressor is a critical element for the reliability of the overall unit, then testing just the compressor may allow using sufficient samples for a statistical based test.
Another benefit of subsystem testing is the ability to focus on specific failure mechanisms and generally apply the appropriate stresses without confusion of the many other elements of system confounding or obscuring results. Accelerated testing of a compressor outside a system may permit operating at higher loads and temperatures then otherwise possible, for example.
Performance monitoring of the operation of one or two units, if related to degradation of performance that leads to or indicates system failure may provide the necessary reliability information. The catch here is understanding the measures that related to failure mechanisms of interest.
For example, if we have a motor has a primary failure mechanism caused by brush wear, we may be able to monitor current draw as a means to monitor the brush wear and eventual failure. Current draw may also change as the impedance or rotating friction or some other mechanism changes over time, hence this approach works well when you really understand the dominant failure mechanisms well.
Statistics and samples representing the population are not feasible when testing just one or two units in most cases. So, think though what you are trying to do with the testing and what is possible. Using a sample to make a statement about the population may not be possible using any reasonable statistical confidence. Be clear about that when designing the test.
A common approach for no failure testing (success testing is what I call it) is the use of Binomial (pass/fail) testing with zero failures. The success testing sample size formula is
Where, n is sample size to show at least R reliability with C confidence. For example you would need to run 22 samples over a lifetime of stress to show 90% reliable with 90% confidence.
The issue of small sample sizes and success testing results in the test being able to only confirm rather poor lower limits for reliability. Setting confidence below 50% implies the samples results are more likely to not represent the population, thus I recommend not going lower than 60% for confidence.
What can we learn from a test with no failures. No much, if my initial response. Even with a solid set of failure mechanism models without a failure we do not have confirmation the testing aged the product as expected nor that the system would fail as expected.
Of course, if we do have a failure when designing a success test, that is usually a surprise and contains all kinds of great information.
So, what can we reasonably expect to learn? If the testing and models are all in good shape, we gain a little confirmation that our assumptions are ok. That is about it.
If we are able tear down the units after testing and examine the wear or aging effects, we gain model confirmation that is now supported by evidence. If we can monitor degradation paths to failure, even if there is not failure during the testing, we again learn about the systems ability to follow the failure mechanism models (or we gain some information that we can use to update our models).
Reliability monitoring and measurements on even one sample with success testing helps to create a profile of what to expect during operation. It’s a start. As we run units in production or in actual use, continuing to collect reliability data helps us eventually build and confirm the models of reliability performance.
Reliability block diagrams, Petri Net or Markov Modeling, along with subsystem testing and analysis allow us to understand and estimate system level performance even for complex systems. Simple testing one or two units using success testing without the understanding of failure mechanisms is, in my opinion, a waste of time.
2 Samples and No Failures
Is this really reliability testing? In some cases it is not. It simply a basic demonstration that the prototype works as expected for the duration of the testing. Using any form of acceleration requires a fundamental understanding of the failure mechanisms and how they respond to the applied stress.
When you are asked to setup a reliability test and limited to one unit that you should not break, you’re first step is to come a clear understanding with those needing the test results. What do they expect the test results will mean? What decisions are going to be made based on the results? And, does everyone understand the limitation of reliability testing in general and specifically given the constraints of limited samples and success testing?
Make it clear what can and cannot be said based on the results. Then go out and run the test the best possible way you can sort out to do. Have fun, it might break and you and your team will certainly learn something.
17 thoughts on “Thoughts on Testing One Sample and No Failures”
Fred, Thank you for your attention to this matter.
I think you missed the method of the test time longer than the operating time. There is a Reliasoft’s Weibull++ DRT utility. The test of aircraft engines on a triple lifetime uses this method. We can have a quantitative estimation of the reliability from the qualitative tests.
Good catch Oleg, yes, it is a viable solution to run longer than one lifetime for success testing. It certainly helps with reducing the sample size requirement for success testing, yet also has limits. With just two samples it really doesn’t help much concerning the statistics. cheers, Fred
Thank you for the article. Another constraint that you forgotten to discuss is the schedule. The reliability engineer is most of the time limited by time (e.g. samples are available late in the design process, aggressive TTM schedule, competitive market, reliability engineering as a check mark) so having the luxury to test for multiple lives a small sample is not an option.
Well said, time is certainly factor. cheers, Fred
It’s true. I think our goal of tests is not statistics, but risk reduction.
Hum, how much risk reduction do you achieve? Is it possible to detect defects that occur in only 10% of units using only two samples? I don’t think so. Which risk are you address and to what extent? We tend to use stats to quantify this risk, yet there are other ways, I suppose.
Fred, I see a new interesting topic emerged in this discussion – Statistics of failures vs a longer test without failures.
You may be right – it is an interesting topic. Would you like to draft an article for the NoMTBF site on the topic?
Very interesting post Fred and really good question Oleg.
I would like to throw another related question:
Let’s say we have two units and we know how the mean life is related with level of stress (so let’s assume the inverse power law is fully determined and we are in good shape to use this accelerated life testing conclusions to minimize our test time for reliability demonstration)
Let’s assume than we also know that the failure rate can be assumed constant in the service life of the equipment.
If we use the Reliability Demonstration Test based on the Cumulative Binomial distribution we could infer the test time per unit required to demonstrate with a certain confidence a certain mean life even if we test only two units.
What do you think?
HI Ricardo, You could , yet given the assumptions and use of binomial, the lower bound of the estimated reliability would be pretty low. Running the test multiple expected lifetimes helps, yet there is a limit there too as you begin to run up against very long term related failure mechanisms, unrelated to use in normal time. It’s possible, not generally practical. cheers, Fred
Thanks Fred, but there is still something unclear for me…
For example, if we are trying to verify a mean life of 350 000 cycles with 90% of confidence and we know perfectly the mean life relation with the level of stress. Imagine we have the options hereafter:
1.) Test one unit with zero failures: 70 000 cycles at 1.2x real stress (equivalent to 800 000 cycles at real conditions) verify the statement.
2.) Test 4 units:
– with zero failures: 18 000 cycles per unit at 1.2x real stress (200 000 cycles per unit at real conditions) verify the statement
– with one failure allowed: 35 000 (400 000) idem…
– with two failures allowed: 60 000 (700 000) idem…
The constraints are the number of units that I can buy and the limited test time.
Theoretically option 1 and 2 show the same level of confidence (90%).
What is it the best? And the most conservative? Other option?
I appreciate your helpful opinions
I really don’t like any of these approaches – first, get failures in your testing if at all possible. Test to failure. second, give the design how will you confirm the constant failure rate assumption?
A binomial test design that has some failures, tends to have more power than a success testing only. This helps you detect sharply between the good and bad outcomes.
Yes, Fred. I fully agree with your general points. I am just pushing to decide which is the least worst option, taking in mind that this approach is not my favorite too. But what to do if you only have 4 units and limited time to test?
From my point of view, testing the 4 units with zero failures give you “some information” of what is expected for this equipment in the future real conditions with some confidence (under some assumptions of course). If you find one or two failures in this test, sure you could extract more information and increase the test time (to get get the same confidence) in the remaining units to achieve the demonstration of the realiability objective set…
Then if you have more time, you could run all the four units to failure and invesigate the failure mechanisms and even try to adjust your data to Weibull or Lognormal distribution, even if you have just 4 units…
So Fred, what to do with four units (a much lower number of units than the non-parametric binomial recommended quantity) to demonstrate a Reliabilty objective at a certaint time, much higher than the test time available?
I tend to work toward verifying the margins in the highest risk areas. Not being able to demonstrate reliability statistically, how about confirming what will fail then? From here we may be able to use existing physics of failure models or run specific experiments to estimate system time to fail due to the failure mechanisms most likely to fail.
A failure is better than just an improvement in the statistical power of a test, it helps understand exactly what is failing.
Hi Ricardo, All is not so bad as it seems.
I am convinced a good test without failures is better than a good test with failures :-)
The case of exponential lifetime distribution is the easiest: 1-CL=exp(-λtn) – we can change λ, t and n proportionally to provide CL. For example, we can use 5×λ, 2×t and 1/10×n.
This case is not very interesting in terms of results because exponential lifetime distribution has a very heavy right tail.
I agree Oleg – yet the numbers can only get us so far.
Thank you for your valuable opinions guys.