When we go to an automobile race such as the Indianapolis 500, watching those cars circle the track can get fairly boring. What is secretly unspoken is that everyone observing the race is watching for a race car to find and sometimes exceed a limit, finding a discontinuity. The limit could be how fast he enters a curve before the acceleration forces exceed the tires coefficient of friction, or how close to the racetrack wall, he can be before he contacts it and spins out of control. Using the race analogy, time trials before the race are like the design phase of electronics products, where only one race car one the track, and manufacturing like the race where many cars and consistent control of each is required to have an “accident free” race.
Of course no one wants to see a driver injured or killed at these events, but watching cars circle the track without incident is fairly boring. The same is true in testing of electronics hardware and software. Highly Accelerated Life Testing (HALT) is fairly boring until an empirical limit, a discontinuity, is discovered. Fortunately engineers are not injured or killed discovering empirical stress limits in HALT evaluations of electronics systems.
Formula One Racing – Pushing to the limits
HALT methodology really is a limit discovery tool, not a pass-fail test. Near and at the empirical, not theoretical or specified operational limits, provides some of the most useful data lies. It is the fastest way of finding weaknesses and for comparisons of electronics systems designs. Observing wide differences in operational limits between samples of the same product provides evidence of some component(s) inconsistent manufacturing processes affecting the system. If the deviation is large enough, the variation probably will affect operation of a smaller percentage of units at field use conditions. Discovery of variable empirical limits of multiple samples can be a discriminator for the quality of component and assembly process consistency. Wide deviation of operational limits between identical system samples is a good indicator of uncontrolled, possibly unknown, process variation that if wide enough will lead to failures in the intended use environment. Even if the numbers of units compared is not statistically significant, wide differences in limits are good qualitative indicators for reliability risks.
Stress testing at well below the operational limits even though it may be well beyond the end use specifications provides only very limited data on the product’s strength capability. Testing to only those “margins above spec” if not close to the empirical stress limit is just like watching a car race with a 120 mph speed limit. Some probability exists that a race car in this speed limited type of race could have a failure, and some cars would have failures and “lose” the race. Still failure would likely be rare and most of the vehicles would be tie for the win and there would be little differentiating information would be available for improving handling, durability or reliability over the competing cars. As in typical reliability testing, the cars much faster (higher stress) than most cars are driven and most accelerated reliability testing of electronic is performed at higher stresses than most systems will be exposed to in their useful life, and some percentage do fail in these milder but above spec stress conditions.
So why not test to empirical operation, and sometimes destruct, limits (i.e. HALT)? It is the quickest way to get useful data on product weaknesses. Why do so many resist testing electronics systems to empirical stress limits of voltage, temperature, vibration, shock, and other stresses that provide data on what the ultimate stress capability is? Here are just some of the reasons given in the last couple of decades:
- Product failures above specified component stress specifications are “foolish failures”
- Products in the field will never be subjected to those stress levels
- The product is too expensive to destroy the samples
To briefly answer those reasons
1. All components have margins above specification and functional margins are very dependent on its application in the design, not individual component specifications. Why assume any failure is foolish before finding it. Not testing to the operational strength of the actual product is leaving what could be valuable data (and ultimately money) on the table
2. The product may not see the instantaneous stress levels used in the tests, but the cumulative fatigue damage of lower field stresses have a high probability of failing the same weakness in the design that is found at the destruct limits.
3. How expensive is a product failure to company and its customers? Finding out in the weaknesses in a test lab is almost always less costs than lost sales and warranty costs when a latent defect or weakness reaches the customers. There is a risk to all testing and to find weaknesses at limits, you risk catastrophic damage. In digital systems it is very difficult to destroy systems below thermal empirical operating limits due to the parametric shifts causing failures in signal integrity. Maybe it is because there are many that believe finding empirical limits results in a pile of melted solder, components and plastics. Vibration on the other hand will eventually cause a hard failure, where the operational limit is also a destruct limit. In any case, many times the unit can be repaired and re-used for additional testing.
In the reliability development of a new product we are somewhat like a person in an unfamiliar dark room. We really don’t know how big the room is until we bump into a wall, and actually several walls, to define the available space in the room. In electronics testing, until we find the actual empirical limits of stress, we do not know what the actual “stress” space is that can be used to find marginal functional or material issues. The larger the stress space, the faster we can find the strength “entitlement” and use that strength to find the one or two weaknesses in an electronics product that puts overall reliability at risk.
Just like the title of a song by the rock group the Eagles, we should in testing “Take it to the Limit” to fully benefit from each sample of electronics systems we test. You will find it takes fewer units, less time and money to find the few elements in a design that really could impact field reliability.