Failure Happens – It Is What Happens Next That Matters
One of the benefits of reliability engineering is failure happens.
Nothing made, manufactured, or assembled will not fail at some point. It is our desire to have items last long enough that keeps us working. Since failures happen, our work includes dealing with the failure.
Not My Fault
Years ago while preparing samples for life testing at my bench, I heard an ‘eep’ or a startled sound from a fellow engineer. It was quickly followed by an electrical pop noise and a plum of smoke.
Something on the circuit board she was exploring had failed. With a pop and smoke. She didn’t move.
At this point, my initial amused response turned to concern for her safety. She was fine, just startled as the failure was unexpected. She quickly claimed it wasn’t her fault.
It was her design, she selected and assembled the parts, and she was testing the circuit. Yet, it wasn’t her fault. She did not expect a failure to occur (a blown capacitor – which we later discovered was exposed to far too much voltage), thus it was not her fault.
We hear similar responses from suppliers of components. It must have been something in your design or environment that caused the failure, as the failure described shouldn’t happen. It’s not expected.
Well, guess what, it did happen. Now let’s sort out what happened and not immediate assign blame for who’s fault it is.
The ‘not my fault’ response so a failure is not helpful. Failures are sometimes the result of a simple error and quickly remedied. Other are complex and difficult to unravel. The quicker we focus on solving the mystery of the cause of the failure, the quicker we can move on to making improvements.
With possibly too many ‘not my fault’ responses, laws now enjoin the manufacture of products to stand behind their product. If a failure occurs, sometimes within specific conditions, the customer may ask for a remedy from the supplier.
If failures did not happen there would be no such thing as a warranty.
A warranty is actually a legal obligation, yet has turned into a marketing tool. A long warranty implies the product is reliable and by offering a long warranty the manufacturer is stating they are shifting the risk of failure to themselves.
A repair or replacement is generally not adequate recompense for a failure, yet it provides some restitution. In most cases, it only provides peace of mind, if the item doesn’t fail.
The warranty business has become an industry in of itself. Selling, servicing, and honoring warranties is something that others can deal with outside your organization. The downside is the lack of feedback about failure details so you can affect improvements. A manufacturer shouldn’t hide behind their warranty policy, nor ignore the warranty claim details. It is one-way a customer can voice their expectations concerning product reliability. You should listen.
My favorite outsourced repair service story involved a misguided payment structure.
If you pay a repairman based on the value of the components replaced, they will likely always replace the most expensive components. If the repair is accomplished by resetting a loose connector, nothing is replaced, and the repairman is not compensated for the diagnostic work and effective repair. If he instead immediately replaced the main circuit board, and in the process reseating most of the connectors, the repair is fast, effective, and he is handsomely rewarded.
See the problem?
When a failure occurs, it may be natural to offer a repair service as the remedy. It should be quick (not a two-week wait as with my local cable company to restoring a fallen line), and efficient for all parties involved. For the owner of the equipment, we want the functionality restored as quickly as possible and cost effectively as possible. For the manufacture of the equipment, we want cost effectiveness, plus the knowledge concerning the failure.
Does your repair service provide for the needs of both parties as well as the repair technician?
Sometimes when a failure occurs nothing happens. We might not even notice the failure occurs. Other times the product simply goes ‘cold’ or a function is lost. Nothing adverse, no pop or smoke, occurs.
We call this failing safe. It’s more complicated than my simple explanation, yet it is the desired repose to a failure. The product itself should not create more damage, cause harm, place someone in peril. It should fail safely and preferably quietly.
If the ignition falls from the ignition switch, which may be considered a failure to retain the key within the switch, the driver should not lose control of the vehicle. This is in part a safety feature, yet is also a common expectation that the failure of a system should not create other problems.
Failure containment is related.
How does your product fail? Safely?
For some failures, such as the degradation of lubricants, we perform maintenance. When the brake pads or tire tread wears to marginally safe level we replace the brake pad or tire. If we can anticipate the failure pattern we perform preventive maintenance.
Creating a maintainable piece of equipment is one response to failures. It allows creating complex equipment with failure prone elements. Through maintenance, we are able to restore the system to operation or avoid unexpected downtime. If failures didn’t occur, we wouldn’t need maintenance.
We have some control over the nature of the maintenance activities. For some types of failures, we can only execute corrective maintenance. For others, we can use preventative methods. The idea is to anticipate and avoid the widest range of failures through effective maintenance practices, that remains cost effective.
Adding maintenance practices in response to system failures is not the duty of the owner of the equipment. It is a design function to anticipate the system failures that may occur and devise the appropriate maintenance plan to thwart unwanted failures from occurring. The two parties actually have to work together to make this work well.
When I buy a product, I know that some proportion of products like the one I just purchased will failure prematurely. I just do not want or desire mine to fail. My expectation is the one I select at the store is a good one. It won’t let me down, stranded, or injured. That is my expectation.
When a failure does occur and I value the functionality the product provides I will want to restore the unit via repair or replacement, sometimes via a service contract or warranty or repair center. To a large degree, my expectation is after a failure all will go well.
As the manufacture of products, when a failure occurs, your expectations may include learning from the failure to make improvements. Or it should.
We know we cannot anticipant nor avoid every failure that may occur. The expectation on both sides is to make robust and dependable products that provide value for all involved. When that approach fails, we fail.
In response to a failure, it’s how the product, customer, and manufacture responds that matters. A simple failure can turn into a disaster for all involved. Or the failure can provide insights leading to breakthrough innovations and new opportunities.
It’s how we respond that matters.
How do you respond to failures?