This post is a conversation first held on the LinkedIn group No MTBF. I’m capturing a portion of the contributions here to continue the discussion or to widen the audience. Reminds me of always assuming 95% confidence is the right value when designing a test, or assuming constant failure rate. So, let the conversation continue, starting with the original post.
Where does “0.7eV” come from?
Most manufacturers are still using an Arrhenius law (shouldn’t we rather call it “Erroneous law” ? ;-) with an activation energy of 0.7eV to extrapolate High T Operating Life test data to use conditions for every kind of electronic component (from complex ICs to simple passives). It is often claimed that 0.7eV are based on “historical data”. I have never seen actually any paper or pubblication where this activation energy has been really measured. The use of a constant hazard rate (lambda) was originally justified by the fact that electronic boards have many different components with different failure mechanisms. The argument was that different Weibull distributions with different activation energies yield on an average a roughly constant hazard rate for which an apparent activation energy can be defined. Many manufacturers seem to be convinced now that the other way round must work too! Since a constant hazard rate with Ea=0.7eV has been once claimed for electronic boards, every single component must follow the same law too! It is just amazing how errors do propagate!
Well done young man. I do agree with you 100%. I do wish that other “reliability” experts have knowledge and ability to ask the same question. Keep asking and in touch.
by Dr K
Great observation and insight. I’d have to check the math, yet I seem to recall that 0.7eV roughly doubles the failure rate (this part I’m not sure about as I think about it) with an increase of 10°C. Playing with a few numbers just now and found, as expected that it depends on what temperature range the steps of 10°C are considered, yet roughly, very roughly, the failure rate roughly doubles with each 10°C. It’s the nice round numbers and that it’s easy to remember is probably where that ‘rule’ came from. In short, as you noticed, do the math and check the assumptions. Getting the right activation energy is too important to simply guess.
I would recommend the following-
- Check Dimitri Kececioglu’s “Burn In Testing- It’s quantification and optimization”. I had referred to the text around a year back for the figure of Activation Energy for different components.
ASM International’s EDFAS references. I had read certain references sometime back at their website on this subject.
by Vinod Pal
Enrico, very good observation.I have seen no evidence that there is any valid basis for the 0.7 eV as I have never heard what physical mechanism that the activation energy is being referenced to. Is it oxide breakdown, electromigration, diffusion? It makes no sense when it is referring to the propagation of solder crack, or component package delamination, as those mechanisms are driven by thermal cycles and vibration. So much of reliability prediction of electronics is smoke and mirrors, and the real causes of unreliability are due to mistakes or overlooked design margin errors, errors in manufacturing, or abuse by customers. These causes are not predictable or follow some predictable pattern that can be modeled. There is a much better use of engineering resources to make a reliable electronics system by using stress to rapidly these overlooked margins, and errors in manufacturing before they are produced in mass quantities or shipped to the customer. We have more life in the vast majority of electronics that we really just need to remove the unreliable elements (from the causes previously mentioned) and we have a robust system that will exceed its technologically useful life.
To obtain the activation energy the arrhenius model is simply fit to time to failure versus temperature and the activation energy is solved for. Specific failure mechanisms are typically looked for except in the case of basic material evaluations where failure is defined as a 50% loss of tensile strength. The value of 0.7 is a common rule of thumb, remembering that the lower this number the less time compression one gets for an increase in stress. If you look at polymer materials suppliers will often perform aging tests and activation energies will be published. This value is reasonable for high glass filled nylon (45%), but for unfilled nylon the activation energy is 1.0, low to medium levels of glass fill are 0.9 to 0.8. For electronics there are many studies that have looked at various failure mechanisms but most are chemical in nature and documented what the activation energy is and as expected there are ranges of values. High temperature does produce grain growth in solder which reduces strength and can reduce thermal cycle life. Research is ongoing for lead free solders compared to tin lead. See Joe Smentana’s published work here. Some published values: Silicon semiconductor devices Silicon Oxide 1-1.05 Electromigration 0.5-1.2 Corrosion 0.3-.06, 0.45 typ. Intermetallic Growth Al/Au 1-1.05 FAMOS Transistors Charge Loss 0.8 Contamination 1.4 Oxide Effects 0.3 IC MOSFETs, Threshold Voltage Shift 1.2 Plastic Excapsulated Transistors 0.5 MOS Devices 1.1-1.3, Weak populations 0.3-0.9 Flexible Printed Circuits Below 75C 0.4 Flexible Printed Circuits above 75C 1.4 Opto Electric devices – 0.4 Photo Transistors 1.3 Carbon Resistors 0.6 LEDs 0.8 Linear Op Amps 1.6-1.8, Weak populations 0.7-1.1 In general, damage models provide some technical basis for accelerated tests but never decimal point accuracy, even though we get precise answers. It is quite easy for people to get comfortable with a common number such that they becomes a sacred cow and nobody knows where the information came from or its proper application.
There is still a lot of ignorance in the world. I agree electronic components are not a major issue. The eV value is determined by simply fitting a model to scattered set of measurements for an average value. One can define a strategy to use them all but it is just a waste of effort. These can be used for specific field noted issues. For a test program, one might choose the conservative grand average. If an aging issues exists, it is often due to a weak sub population, thus the need to understand if there is a difference (which there can be) but it can just be a scaling factor of life with a consistent activation energy. For a test program one considers where the most risk is; and you are correct it is not in the components nor aging damage. It appears that some blindly believe that Arrhenius is all you need to consider. Wrong. Aging can reduce strength of solder joints and thus reduce solder joint thermal cycle life. This is not consistently true for lead free solder where it is a mixed bag still being researched. Hence to guide the aging target and exercise due care, an conservative average activation energy is often chosen. Aging does change the strength of engineering polymers, so for these needs this is a valid use. A value in activation energies is after root cause failure analysis. If a particular mode is created that could be caused by an aging mechanism, analyses are possible to determine the population risk for that failure considering what we know about the failed part, where it is used, how it was used, etc. and analyzing in light of usage and environmental variation. These values do not help make reliable products, but they can help prioritize where design enhancement focus should occur. A high activation energy value mechanism is more sensitive from a stress perspective. There is false precision with any of these models. The weakest things fail first and if everything fails for a non aging reason one will never observe an aging failure in the field. Hence, if the thermal cycling mode occurs much earlier, this dominates, and thus one concentrates on minimizing CTE mismatch, strain relief, or other design strategies. Experience often gives a hint for what stressors will produce failures within the design life. Poor quality can be detected using burn in. Here we just want to be sure we do not remove too much life from the product such that an Arrhenius analysis can be useful. For the most part thermal cycling is consistently the most effective stressor for electronics. Focusing too much on Arrhenius is incorrect as the aging mechanisms are not the primary cause for field failure. However to be true to a simulation need, it is reasonable to provide a requirement with some merit for the aging portion in addition to the thermal cycling requirement if this is a major stressor in the field environment. The thermal cycling requirement is also subject to similar hand waving with various models, exponents, material constants, dependency upon dwell times, rates, catalytic effects, etc. In real life one often has a mix of failure modes the occur randomly resulting in the overall exponential failure distribution observation. One lumps them all together and possibly the Arrhenius model fits even though it is not true to the physics, it still may be useful for an engineering need. Attach a named formula to it, communicate confidently and credibility goes way up with management. Consideration of aging, thermal fatigue, structural fatigue, corrosion, etc. damage mechanisms is prudent for any reliability engineer. With all these competing modes we want to focus on the ones with the highest risk. The models help us decide where to focus our efforts, and sometimes comprehend the physics, but they really only tell us within an order of magnitude when the product might fail. Analysis simply bounds our uncertainty to a perceived tolerable level.