Just a short post to point to a newly added paper to the reference section. A few years ago I recalled seeing a paper that studied the difference to expect between various parts count methods and actual results.
Jeff and colleagues did this work some time ago, and in most cases the underlying parts count methods haven’t changed too much, so I suspect the results are still very relevant.
The bottom line – expect as much as -100% to +500% different between the prediction and the actual result.
Historically Reliability Engineering of Electronics has been dominated by the belief that 1) The life or percentage of complex hardware failures that occurs over time can be estimated, predicted, or modeled and 2) Reliability of electronic systems can be calculated or estimated through statistical and probabilistic methods to improve hardware reliability. The amazing thing about this is that during the many decades that reliability engineers have been taught this and believe that this is true, there is little if any empirical field data from the vast majority of verified failures that shows any correlation with calculated predictions of failure rates.
The probabilistic statistical predictions based on broad assumptions of the underlying physical causes begin with the first electronics reliability prediction guide begin November 1956, with the publication of the RCA release TR-1100, “Reliability Stress Analysis for Electronic Equipment”, which presented models for computing rates of component failures. This publication was followed by the “RADC Reliability Notebook” in October 1959, and the publication of a military reliability prediction handbook format known as MIL-HDBK-217.
It still continues today with various software applications which are progenies of the MIL-HDBK-217. Underlying these “reliability prediction assessment” methods and calculations is the assumption that the main driver of unreliability is due to components that have intrinsic failure rates moderated by the absolute temperature. It has been assumed that the component failure rates follow the Arrhenius equation and that component failure rates approximately doubles for every 10 °C.
MIL-HDBK-217 was removed from the military as reference document in 1996 and has not been updated since that time; it is still being reference unofficially by military contractors and still believed to have some validity even without any supporting evidence.
Much of the slow change in the industry is due to the fact that electronics reliability engineering has a fundamental “knowledge distribution” problem in that real field failure data, and the root causes of those failures can never be shared with the larger reliability engineering community. Reliability data is some of the most confidential sensitive data a manufacturer has, and short of a court order will never be published. Without this real data and information being disseminated and shared, one can expect little change in the beliefs of the vast majority of the electronics reliability engineering community.
Even though the probabilistic prediction approach to reliability has been practiced and applied for decades any engineer who has seen the root causes of verified field failures will observe that most all failures that occur before the electronic system is technologically obsolete, are caused by 1) errors in manufacturing 2) overlooked design margins 3) or accidental overstress or abuse by the customer. The timing of the root causes of these failures, which many times are driven by multiple events or stresses, are random and inconsistent. Therefore there is no basis for applying statistical or probabilistic predictive methods. Most users of predictions have observed the non-correlation between estimated and actual failure rates.
It is long past time that the electronics design and manufacturing organizations to abandon these invalid and misleading approaches, acknowledge that reliability cannot be estimated from assumptions and calculations, and start using “stress to limits” to find latent failure mechanisms before a product is released to market. It is true that you cannot derive a time to failure for most systems, but then no test can provide an actual field “life” estimate for a complex electronic system nor do we need to. There is more life than needed in most electronics for most applications.
Fortunately, there is an alternative. A much more pragmatic and effective approach is to find to put most engineering and testing resources to discovery of overlooked design margins or a weakest link early in the design process (HALT) and then use that strength and durability to quickly screen (HASS) for errors during manufacturing. HALT and HASS have little to do with a specific type of chamber or chamber capabilities. It is a fundamental change in the frame of reference for reliability development, moving instead from time metrics to stress/limit metrics. Many have already realized this new frame of reference. Since they have found these methods much more efficient and cost effective for developing robust electronics systems, it gives them a competitive advantage. They are not about to let the world or their competitors know of how successful these methods are.
With the kind permission of Wayne Nelson and Robert Abernathy we are posting an article on the analysis of repair data. As you may know, the assumptions made when using simple time to failure analysis of repairable systems may provide misleading results. Using the analysis method outlined by Wayne is one way to avoid those costly mistakes.
Here is the opening elements of the work by Wayne, followed by a link to the full paper.
Appendix M: Repair Data Analysis of Abernethy, R.B. (2006), The New Weibull Handbook, 5th ed., available from Dr. R.A. Abernethy, weibull@worldnet.att.net, 536 Oyster Road, North Palm Beach, FL 33408. May 5, 2006
AN APPLICATION OF GRAPHICAL ANALYSIS OF REPAIR DATA
Wayne Nelson, consultant
WNconsult@aol.com, 739 Huntingdon Drive, Schenectady, NY 12309, USA
SUMMARY. This expository article presents a simple and informative non-parametric plot of repair data on a sample of systems. The plot is illustrated with transmission repair data from cars on a preproduction road test.
Purpose. This article presents a simple and informative plot for analyzing data on numbers or costs of repeated repairs of a sample of systems. The plotting method provides a non-parametric graphical estimate of the population mean cu¬mulative number or cost of repairs per system versus age. This estimate can be used to:
1. Evaluate whether the population repair (or cost) rate increases or decreases with age (this is useful for sys¬tem retirement and burn-in decisions),
2. Compare two samples from different designs, production periods, maintenance policies, environ¬ments, operating conditions, etc.,
3. Predict future numbers and costs of repairs,
4. Reveal unexpected information and insight, an impor¬tant advantage of plots.
Overview. Section 2 describes typical repair data. Section 3 de¬fines the basic population model and its mean cumulative function (MCF) for the number or cost of repairs. Sec¬tion 4 shows how to calculate and plot a sample estimate of the MCF from data from systems with a mix of ages. Section 5 explains how to use and interpret such plots.
Dr. Wayne Nelson is a leading expert on analysis of reliability and accelerated test data. He consults and gives training courses for companies and professional societies. For 24 years he consulted across the General Electric Co. and received the Dushman Award of GE Corp. R&D for developments and applications of product reliability data analysis. He was elected a Fellow of the Amer. Statistical Assoc. (1973), the Amer. Soc. for Quality (1983), the Institute of Electrical and Electronics Engineers (1988) for his innovative developments. He was awarded the 2003 Shewhart Medal and the 2010 Shainin Medal of ASQ and the 2005 Lifetime Achievement Award of IEEE for outstanding developments of reliability methodology and contributions to reliability education. He authored three highly regarded books Applied Life Data Analysis (Wiley 1982, 2004), Accelerated Testing (Wiley 1990, 2004), Recurrent Events Data Analysis (SIAM 2003), two ASQ booklets, and 130 journal articles. He can be contacted via WNconsult@aol.com.
Dr. Robert B. Abernathy, www.bobabernethy.com