How Many Assumptions Are Too Many Concerning Reliability?
When I buy a product, say a laptop, I am making an educated guess that Apple has done the due diligence to create a laptop that will work as long as I expect it to last. The trouble is I don’t know how long I want it to last thus creating some uncertainty for the folks at Apple. How long should a product last to meet customer expectations when customers are not sure themselves?
They probably made a few assumptions. They most likely use past behaviors for upgrades, replacements, and product returns. They used the best available information to create a target reliability that most likely would meet my laptop durability expectation. So far, Apple products have served me well, thus what ever they are doing it’s working.
For our own systems and products we make assumptions concerning reliability. Some we may not even be aware exist nor the impact they have if the assumption is not valid. We have to make assumptions, yet being conscious of them if a great start to creating reliable products.
Broad bold assumptions
Let’s assume the design is good and will perform as expected for our customers. Great, that works well as it means we do not need to set goals, consider risks, evaluate user performance nor expectations, conduct any testing or risk delaying the product launch with reliability issues. All is good, if the assumption is true.
It might not be.
One of my favorite less bold assumption yet no less broad it to assume the product is in the flat part of the bathtub curve. I’ve seen this assumed thus implying the failures that occur are random in nature and have an equal chance to occur for every interview of time. We make this assumption based on the effort to minimize supply chain and factory issues that may lead to early failures. We also assume the design team has consider every wear out related mechanisms along with full understanding of customer use and environmental stresses thus creating a product that will fail due to wear out well after customer have any expectation on use.
The trouble is we often do not fully resolve all supply chain, assembly, transportation sources of failure, nor do we fully understand all potential wear out type failure mechanisms. It’s a great assumption though, as it assumes everyone is doing their work perfectly and the only issues remaining are those pesky random failures.
The assumption on a constant failure rate is also wonderful as it simplifies product testing as we can tally hours of testing from multiple prototypes as if it was one unit running much longer. Since the chance for failure per hour is the same over time, using 10 units for 1,000 hours provides an insight to the designs performance over 10,000 hours, or so the story goes.
Checking assumptions
This starts by being aware of the assumptions. Then assessing the risk to reliability performance if the assumption is not true. Sure, some assumptions are not to bad, some are devastating if invalid. If we assume most customers operate their laptop with an ambient temperature of 25°C and actually most run in 22°C environments, that will not make too much of a difference. Although if we assume and test assuming the product will operating with only random failures occurring, we may grossly underestimate the field failure rate.
List the assumptions. Evaluate the magnitude of the risk if the assumption is not valid. Check the assumptions carefully for those critical assumptions. If assuming a constant failure rate, track time to failures and check if the exponential distribution actually fits or not.
It takes work to check assumptions. Tough. Do it and save yourself and your organization the consequences if your assumptions are poorly made.Just because we wish something to be true, mother nature has a way of proving us wrong.
We make assumptions and that is necessary in our day to day work. Make your assumptions explicit and check that ones that are important. ‘What if’ is not to be ignored.