Finding and eliminating early life failures
MTBF for electronics life entitlement measurements is a meaningless term. It says nothing about the distribution of failures or the cause of failures and is only valid for a constant failure rate, which almost never occurs in the real world. It is a term that should be eliminated along with reliability predictions of electronics systems with no moving parts.
There is also another term widely used in reliability engineering that is a bit of a misnomer and should be eliminated, that is the term “Infant Mortality”. The term “infant mortality” typically is used to describe early life failures in an electronics system during the declining hazard rate period which may extend to its technological obsolescence.
It is my experience that it is a term used dismissively as it if it was “expected” or acceptable as a intrinsic yet generic cause of failures within the first weeks or months of a new product introduction. It is also considered by some traditional reliability engineers I have met as a “quality department” problem, not to be confused with reliability engineering.
The vast majority of human infant mortality occurs in poorer third world countries and the main cause is dehydration from diarrhea which is a preventable disease. There are many other factors which contribute to the rate of infant deaths, such as limit access to health services, education of the mother, and access to clean drinking water contribute.
Human infant mortality is defined as the number of deaths in the first year of life. The contributing causes of human infants and failure of electronics of course are completely different. Causes of human infant mortality comes from the fact that at birth a child may go through a complicated delivery and does not have a fully developed immune system, so it has less resistance to infections. The lack of health care facilities or skilled health workers is a contributing factor.
An electronic component or system is not weaker when fabricated; instead it has the highest inherent strength when turned on for the first time. Opposite of humans, electronics are “adult” when first produced and decline in strength (fatigue life) from that point on. This is why we can subject new systems to high levels of environmental stress to remove latent defects (HASS process) without taking significant life from it.
So why use the dismissive term “infant mortality” to describe latent defects in electronics as if they are expected? The time period that we would classify as “infant mortality” in electronics is arbitrary. It could be the first 30 days or the first 18 months or longer. Since the vast majority of latent (hidden) defects that are found early come from mistakes and errors either in design or manufacturing and is therefore not controlled, they can have a wide distribution of times to failure. Many times the same mechanism in which the weakest manifestations may occur within 30 to 90 days continues as declining rate through a products useable life period.
Failures of electronics systems in the first days or months after manufacture are not due to intrinsic wear out mechanisms that are known. We can only model those failure mechanisms that have an intrinsic and repeatable physics of failure.
Traditional reliability engineering has been focused on making predictions of the life entitlement of electronics systems using cookbooks of FIT rates to derive a system MTBF or MTTR. This is in spite of the fact that there is little or no evidence of empirical correlation to actual causes of most electronics failures. Traditional reliability engineering it seems has not been very focused on early discovery of the causes of early life failures during the the declining hazard rate after market release. Semantics is important and carries implications. The term “infant mortality” contributes to dismissing the significance of early life failures to the overall reliability of a system. Yet, it is where the vast majority of costs are for the customer and any electronics systems manufacturer.
Because electronics are not “infants” and not weaker when first “born” we can be aggressive in our treatment of them before they leave the “birth room”. Unlike newborns we can put new electronics through a stress test and if they fail diagnose and discover an assignable cause which then we can correct for and prevent further failures. Through HALT and HASS we can find the root causes of latent defect failures and by removing those from the production population eliminate the most costly time period of defects and failures which because of the potential wide time distributions can extended until the product is replaced due to technological obsolescence. I believe the term infant mortality when applied to electronics has the connotation that it is expected, inherent, unavoidable, and due to nature. It should be used for human life cycles, not electronics life cycles.