the following is a discussion on the sister Linkedin NoMTBF Group recently. It was and may continue to be a great discussion. Please take a look and comment on where you stand? Do you some form of the Arrhenius reaction rate equation in your reliability engineering work?Join the discussion here with a comment, or on the Linkedin group conversion.
Fred
Enrico Donà has started a discussion: Where does “0.7eV” come from? “Most manufacturers are still using an Arrhenius law (shouldn’t we rather call it “Erroneous law” ? ;-) with an activation energy of 0.7eV to extrapolate High T Operating Life test data to use conditions for every kind of electronic component (from complex ICs to simple passives). It is often claimed that 0.7eV are based on “historical data”. I have never seen actually any paper or pubblication where this activation energy has been really measured. The use of a constant hazard rate (lambda) was originally justified by the fact that electronic boards have many different components with different failure mechanisms. The argument was that different Weibull distributions with different activation energies yield on an average a roughly constant hazard rate for which an apparent activation energy can be defined. Many manufacturers seem to be convinced now that the other way round must work too! Since a constant hazard rate with Ea=0.7eV has been once claimed for electronic boards, every single component must follow the same law too! It is just amazing how errors do propagate!” — |
Enrico, very good observation.I have seen no evidence that there is any valid basis for the 0.7 eV as I have never heard what physical mechanism that the activation energy is being referenced to. Is it oxide breakdown, electromigration, diffusion? It makes no sense when it is referring to the propagation of solder crack, or component package delamination, as those mechanisms are driven by thermal cycles and vibration. So much of reliability prediction of electronics is smoke and mirrors, and the real causes of unreliability are due to mistakes or overlooked design margin errors, errors in manufacturing, or abuse by customers. These causes are not predictable or follow some predictable pattern that can be modeled. There is a much better use of engineering resources to make a reliable electronics system by using stress to rapidly these overlooked margins, and errors in manufacturing before they are produced in mass quantities or shipped to the customer. We have more life in the vast majority of electronics that we really just need to remove the unreliable elements (from the causes previously mentioned) and we have a robust system that will exceed its technologically useful life.
Posted by Kirk Gray
I would recommend the following-
1. Check Dimitri Kececioglu’s “Burn In Testing- It’s quantification and optimization”. I had referred to the text around a year back for the figure of Activation Energy for different components.
2. ASM International’s EDFAS references. I had read read certain references sometime back at their website on this subject.
Posted by Vinod Pal Singh
Hi Fred,
“10C higher–> 2 times faster” is the rule of thumb for chemical reaction rates. Why has it indiscriminately been applied to every kind of wear-out process stays a mystery…
Posted by Enrico Donà
To obtain the activation energy the arrhenius model is simply fit to time to failure versus temperature and the activation energy is solved for. Specific failure mechanisms are typically looked for except in the case of basic material evaluations where failure is defined as a 50% loss of tensile strength. The value of 0.7 is a common rule of thumb, remembering that the lower this number the less time compression one gets for an increase in stress. If you look at polymer materials suppliers will often perform aging tests and activation energies will be published. This value is reasonable for high glass filled nylon (45%), but for unfilled nylon the activation energy is 1.0, low to medium levels of glass fill are 0.9 to 0.8.
For electronics there are many studies that have looked at various failure mechanisms but most are chemical in nature and documented what the activation energy is and as expected there are ranges of values. High temperature does produce grain growth in solder which reduces strength and can reduce thermal cycle life. Research is ongoing for lead free solders compared to tin lead. See Joe Smentana’s published work here. Some published values:
Silicon semiconductor devices
Silicon Oxide 1-1.05
Electromigration 0.5-1.2
Corrosion 0.3-.06, 0.45 typ.
Internetallic Growth Al/Au 1-1.05
FAMOS Transistors
Charge Loss 0.8
Contamination 1.4
Oxide Effects 0.3
IC MOSFETs, Threshold Voltage Shift 1.2
Plastic Excapsulated Transistors 0.5
MOS Devices 1.1-1.3, Weak populations 0.3-0.9
Flexible Printed Circuits Below 75C 0.4
Flexible Printed Circuits above 75C 1.4
Opto Electric devices – 0.4
Photo Transistors 1.3
Carbon Resistors 0.6
LEDs 0.8
Linear Op Amps 1.6-1.8, Weak populations 0.7-1.1
In general, damage models provide some technical basis for accelerated tests but never decimal point accuracy, even though we get precise answers. It is quite easy for people to get comfortable with a common number such that they becomes a sacred cow and nobody knows where the information came from or its proper application.
Posted by Dustin Aldridge
The origin of the 0.7eV is from the Characterization of Metal Interconnect Failures for
Electromigration, failures and it depends on the type of Metal interconnects; these are measure at Wafer Level Reliability (WLR), and continues on packaged products all the
and propage to the board level.
Posted by John Nkwuo PE
There are several papers publlished on this check IRPS Symposium and JEDEC Standards for Electromigration.
Posted by John Nkwuo PE
Dustin and John, so how would you use these published eV values in a PWBA with 100’s or 1,000 of components? Have you observed component wear out as a cause of system unreliability in the field? In the vast majority of systems most components when properly selected for the application have more life than needed before a system is technological obsolete. Intrinsic wear out mechanisms in components is not typically why systems experience failure in 0-7 years. How do the sometimes wide range of eV values (“MOS Devices 1.1-1.3, Weak populations 0.3-0.9”???) help make a reliable electronic system? Why do so many companies and individuals still believe that these numbers help build a reliable electronics system? What do the root causes of verified field failures tell you about why your electronics systems fail?
Posted by Kirk Gray
I must say that I myself was reviweing this 0.7ev out from an extensive DOE that we’ve done in the past to fine tune this numbers. Number was slightly higher but fairly close to that. Yet, I do agree that activation energy numbers need to be carefully used.
Posted by Meny Nahon
There is still a lot of ignorance in the world.
I agree electronic components are not a major issue. The eV value is determined by simply fitting a model to scattered set of measurements for an average value. One can define a strategy to use them all but it is just a waste of effort.
These can be used for specific field noted issues. For a test program, one might choose the conservative grand average. If an aging issues exists, it is often due to a weak sub population, thus the need to understand if there is a difference (which there can be) but it can just be a scaling factor of life with a consistent activation energy.
For a test program one considers where the most risk is; and you are correct it is not in the components nor aging damage. It appears that some blindly believe that Arrhenius is all you need to consider. Wrong. Aging can reduce strength of solder joints and thus reduce solder joint thermal cycle life. This is not consistently true for lead free solder where it is a mixed bag still being researched. Hence to guide the aging target and exercise due care, an conservative average activation energy is often chosen. Aging does change the strength of engineering polymers, so for these needs this is a valid use.
A value in activation energies is after root cause failure analysis. If a particular mode is created that could be caused by an aging mechanism, analyses are possible to determine the population risk for that failure considering what we know about the failed part, where it is used, how it was used, etc. and analyzing in light of usage and environmental variation.
These values do not help make reliable products, but they can help prioritize where design enhancement focus should occur. A high activation energy value mechanism is more sensitive from a stress perspective. There is false precision with any of these models. The weakest things fail first and if everything fails for a non aging reason one will never observe an aging failure in the field. Hence, if the thermal cycling mode occurs much earlier, this dominates, and thus one concentrates on minimizing CTE mismatch, strain relief, or other design strategies. Experience often gives a hint for what stressors will produce failures within the design life. Poor quality can be detected using burn in. Here we just want to be sure we do not remove too much life from the product such that an Arrhenius analysis can be useful.
For the most part thermal cycling is consistently the most effective stressor for electronics. Focusing too much on Arrhenius is incorrect as the aging mechanisms are not the primary cause for field failure. However to be true to a simulation need, it is reasonable to provide a requirement with some merit for the aging portion in addition to the thermal cycling requirement if this is a major stressor in the field environment. The thermal cycling requirement is also subject to similar hand waving with various models, exponents, material constants, dependency upon dwell times, rates, catalytic effects, etc. In real life one often has a mix of failure modes the occur randomly resulting in the overall exponential failure distribution observation. One lumps them all together and possibly the Arrhenius model fits even though it is not true to the physics, it still may be useful for an engineering need. Attach a named formula to it, communicate confidently and credibility goes way up with management.
Consideration of aging, thermal fatigue, structural fatigue, corrosion, etc. damage mechanisms is prudent for any reliability engineer. With all these competing modes we want to focus on the ones with the highest risk. The models help us decide where to focus our efforts, and sometimes comprehend the physics, but they really only tell us within an order of magnitude when the product might fail. Analysis simply bounds our uncertainty to a perceived tolerable level.
Posted by Dustin Aldridge
Kirk — these are tools; in the place of experience engineers or shall I say analysts take a cut at things using “formulas” ; with experience they understand the limits; these are fostered when they have limited budgets or time and are told do one or two things before we ship … in the face of management review use of these fomulae lead to confidence, as Dustin puts it … the challenges come in smaller operations with limited time and dollars
Posted by Eric Drobny
Eric, I understand that they are considered tools, but still if they are based on invalid assumptions, and therefore answers they provide are invalid and misleading they can possibly costly through invalid solutions. Dustin references Arrhenius many times, yet it is ‘erroneous” for most failure mechanisms so using the Arrhenius to calculate “burn-in” life removal is invalid.
Maybe I am wrong, I think most manufacturers (at least their managers and leadership, if not their stockholders) in competitive markets want to produce the most reliable products at the lowest costs. To most effectively do this the priority SHOULD be to discover weaknesses as fast as possible during development and guard against latent defects and process excursions with the most efficient stress screens in manufacturing. This means use STIMULATION with stress, and not simulation (which maybe used at the end of development for qualification), and find those weaknesses and eliminate the cause of them early. Rarely are there more than one or two elements that need to change a product from a weak one to a strong one. Sometimes it may be just software changes in a digital system that can add significant temperature margin.
As long as “reliability engineering” is focused on the back end of the bathtub curve, the wear out phase, they are not dealing with the reality of unreliability in electronics. Just look at the causes of your own companies field returns. What does that tell you ?
Posted by Kirk Gray
I believe in balance. The question related to Arrhenius, hence the response, not a diatribe on why it is worthless, nor a passionate call for stimulation. The number does come from something, how it is applied is a choice. Eric in pointing out the aspect of a tool is correct, stimulation can be a tool that also does not always produce the desired result when used inappropriately or for highly robust designs to begin with. People do the best they can within their knowledge and capability. Arrhenuius can be erroneous, but the generalization is patently incorrect. Stimulation is not a panacea either, it is a best practice that in many cases has proven to be more efficient, but there are cases where Arrhenius is appropriate and useful as well. Who makes the choice and on what basis?
Posted by Dustin Aldridge