Extend Your FMEA Process with Mechanisms
One of the issues I’ve had with failure modes and effects analysis is the focus on failure modes.
The symptoms that the customer or end user will experience are important. If a customer detects that product has failed, that is a failure. The FMEA process does help us to identify and focus on the important elements of a design that improve the product reliability. That is all good.
The issue is the FMEA process doesn’t go far enough to really aide the team focus on what action to take when addressing a failure mode. The process does include the discussion of causes of the failure mode. The causes are often the team members educated opinions on what is likely to cause the failure mode. Often the description of the a cause is a failed part, faulty code, or faulty assembly.
Generally the discussion of causes is vague.
Failure Mechanisms versus Failure Modes
Failures modes are best described as what the customer experiences (no power, loss of function, etc.). Failure mechanisms are the root physical or chemical anomaly that leads to the existence of the failure mode. While we want to remove failure modes, we have to solve, remove, or mitigate failure mechanisms along the way.
The traditional FMEA process in my experience often provides vague classes of causes, hints at potential failure mechanisms, or avoids specifying mechanisms entirely. The actions items from the FMEA study then include investigations to find and understand the actual failure mechanisms (at best) or attempt to address vague classes of mechanisms with broad sweeps of monitoring, testing, or design changes.
Instead focusing the discussion on causes of failures at the level of failure mechanisms, enhances the discussion. Instead of talking about the causes as a component failure, it changes to what happens such that the component fails. Instead a vague average failure rate, it becomes a discussion about design or process errors or variation that leads to the components demise.
The hard part of this approach is the sheer number of ways (root causes) that an item may fail. Consider a simple component solder joint. The potential root causes includes:
- Dendrite growth
- Shear fracture
- Flex cracking
- Pad lifting
- Gold embrittlement
And many others potential issues. Even these brief descriptions may have underlying causes which are the elements requiring attention in order to solve.
Fault Tree Analysis (FTA) and FMEA
Detailing all possible root causes of each failure mode would be tedious and I would suggest unnecessary. One approach I’ve seen is the common approach to FMEA, where we explore the class or basic expected types of root causes that lead to the listed failure mode. Then for the lines in the FMEA study that percolate to the items requiring attention, we then conduct a detailed FTA that flushes out the range and relative frequency of occurrence of the many different underlying failure mechanisms that lead to a specific failure mode.
If the primary cause of a failure mode is a faulty component, then what are the specific mechanisms that lead to a component being faulty. FTA is the right tool here. Used on conjunction with the highest risks identified in the FMEA permit the team to understand and solve or mitigate the right elements in the design or process to make a difference. Being specific with actions that make a difference is the key.
With your work to identify and resolve risks to reliability performance, how do you insure the solutions are actually solving the right problem? What works for you in your organization? How do you extend your FMEA work into effective action?