The calculation of MTBF results in a larger number if we make a series of MTBF assumptions. We just need more time in the operating hours and fewer failures in the count of failures.
While we really want to understand the reliability performance of field units, we often make a series of small assumptions that impact the accuracy of MTBF estimates.
Here are just a few of these MTBF assumptions that I’ve seen and in some cases nearly all of them with one team. Reliability data has useful information is we gather and treat it well.
Assumptions Around Use
Since we only know the shipment date to a customer and when they call, let’s say the unit operated full time from shipment till they called.
We suspect it takes time to actually transport unit from our factory to customer. It may even take time to install and place units into service. Yet that is unknown unless we ask some customers for information. Wouldn’t want to bother customers.
Also, let’s consider every unit is in use. No spares or stored units. Likewise not sitting in a warehouse or store shelf.
We do learn about some, let’s assume all failures, through a call or product return from the customer. Thus, we can only assume all other units are still in service, operating full time.
This set of assumptions tends to increase the number of hours we are counting as operating hours. That pads or helps MTBF (or reliability for that matter) look better than it actually is.
Spend the time to understand the time from shipment till the start of service. This may be a distribution from nearly immediate to months or years till placed into service. You should know this information.
Spend the time to understand the typical operating hours per day. Some items do work 24/7 while others only on occasion. Maybe you have different classes of customers that use your product in very different manners. Again you should know this information.
Only Real Failures Count
Have you noticed that sometimes a customer will call to complain that the product isn’t working and when you received the product back from them, it works just fine? Funny (strange) isn’t it. About 25% of product returns (varies by industry and specific product, of course) have no trouble found, or no fault found. Many of these products are then cleaned up and shipped out to other customers.
Sometimes the customer suggests a product is a failure when it is the wrong color (my ex-wife did this once). Or, it’s a failure if it doesn’t solve the problem they thought it should. Or, it’s a failure if they no longer need the product. Sometimes they call a product a failure when it doesn’t function as expected.
When analyzing returned products we want to know what to do different or better to avoid future product failures. The units with something missing or broken are great, we can get to a root cause right in the lab and implement design/process changes to fix it.
When the analysis finds nothing wrong do we question our analysis to make sure we are evaluating the product as the customer did? Rarely. The customer wanted the unit to operated outside on a cold day, while in the nice warm, clean, lab it starts just fine. Did we just miss an opportunity to improve product reliability? Probably.
If it is an ordering mistake, the wrong color, doesn’t do what we thought it would do, or doesn’t operate as we expected (where is the on/off switch…), are those failures? Did they return the product? If so, the customer called it a failure.
By not counting software bugs, ordering or use errors, or any group of claims or reasons for a product return, we incur the cost of the return and the potential permanent loss of a customer. It’s a failure that requires more than hardware related changes.
Let’s Smooth to View Trends
Smoothing data is a nice word to say averaging. While MTBF is technically an average, an average of averages ‘smooths’ a monthly or weekly reported MTBF value just a bit more.
Let’s say we ship products weekly and groups products made in a specific week into a cohort for our analysis. Let’s say the MTBF values are estimated for each week’s production, assuming the week’s products all go into service at the same time.
Each week we add another batch to the products in the field, and we age all other batches one week. Pretty soon that is a lot of weekly MTBF values. With normal variation, let alone active changes to components, design, or processes, the week to week variability of MTBF may cloud or obscure the trends represented in the data.
Let’s assume a rolling average of 3 or 6 months of data will helps us spot trends. Also, let’s assume there are not meaningful ramp/decline of production at the start, seasonally, or at end of production.
Do you see any problems with this approach? I’ll leave this one open for your comments on why smoothing may be an issue.
Our ability to gather and analyze field data is often a central role in our ability to understand how well a product is performing in the hands of customers. If we make a series of unfortunate assumptions during the gathering, interpreting, or presenting of the data, we are likely to obscure the very information we seek to understand.
Add your comment on why smoothing as described above is a potential problem. Plus, add some of the other unfortunate MTBF assumptions you have seen (and hopefully exposed and corrected!)