Illuminating MTBF’s Lack of Information

14586612669_cc57c310e0_m_dIlluminating MTBF’s Lack of Information

Here’s a simple illustration of how MTBF oversimplifies data concealing essential information.

By convention, we tend to use MTBF for repairable data. That is fine.

You may also be aware of my dislike for the use of MTBF, for many different reasons. If you find yourself suggesting your organization, customer, industry or whomever to stop using MTBF, you may want to use this simple example to illustrate the ‘value’ of MTBF.

Three Data Sets of Collected Time of Repair Data

Let’s say we have three assets on the shop floor that have been running for 1,000 hours each. Each has experienced 10 failures requiring repair. The repair time is typically less than an hour (keeping repair time short compared to run time to keep the analysis simple.)

Machine 1 experienced the failures at the following hours of operation:

112 615
198 692
301 820
425 907
509 989

The times are in hours since the equipment was installed. The first failures occurred at 112 hours after installation. The second occurred 198 hours after installation, and so on.

Machine 2 experienced the failures at the following hours of operation:

112 760
293 813
480 849
560 898
702 920

Machine 3 experienced the failures at the following hours of operation:

112 350
142 424
191 563
230 710
280 879

Given this data what would you typically do to glean a better understanding of your equipment?

Just reviewing the data, you can detect the differences between the three machines. Given the differences, you may adjust your maintenance program, or work to determine why the differences exist.

A Simple DotPlot View

One way to view the data is with a one-dimensional plot. The dot plot provides the location of each failure along the timeline. Here is machine 1’s dot plot:

mcf-1-dotplot

 

Plus the plots for machines 2 and 3:

 

 

mcf-2-dotplot

 

 

 

mcf-3-dotplot

 

 

This provides a little more visibility over the table of numbers. Machine 1 seems to have evenly spaced failures. Machine 2 has more failures as the equipment ages (like my car did when I was in high school). And Machine 3 seems to be running longer between failures as it ages.

These simple plots reduce the work necessary when just viewing a table of numbers. They further illustrate the differences in the datasets.

What If You Calculate MTBF?

Considering MTBF is so popular and widely used, you may feel compelled to calculate MTBF for these three examples.

It’s easy to do, each machine has run for 1,000 hours and enjoyed 10 failures, thus all three machines have 100 hour MTBF.

  • Machine 1 has 100 hour MTBF
  • Machine 2 has 100 hour MTBF
  • Machine 3 has 100 hour MTBF

The use of MTBF suggests there is no difference. We have reduced the information available for consideration. Using MTBF we would treat the three machines exactly the same.

The MTBF values limit the value of the data and preclude our ability to identify differences, take appropriate action, or understand what is happening.

I suggest that is not a good metric.

So, stop using it.

Please feel free to use this example, maybe change it to fit your industry or situation. Help those around you understand their data.

Please let me know of any examples you use to make the point, MTBF is not helping here. Let’s collect and post what helps us get the message across.

About Fred Schenkelberg

I am an experienced reliability engineering and management consultant with my firm FMS Reliability. My passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs.

6 thoughts on “Illuminating MTBF’s Lack of Information

  1. Since MTBF is intended for the constant failure rate period, one could argue that at least two of those data sets would not have warranted characterization by MTBF in the first place.

    1. Sure one could do that… I suppose. Yet, without the plotting and potentially fitting the data to a curve (line) would we really be able to tell if it really was showing a constant failure rate?

      Keep in mind there really isn’t such a thing or period as a constant failure rate period. There are many types of failure mechanisms some with increasing or descreasing failure rates. The closest to a ‘flat part of curve’ we see, is when the changes are small enough to not matter much concerning the decisions the data is supporting. Assuming constant failure rate doesn’t change the actual failure rate. Also, keep in mind that it is rare that a system will remain with little change in failure rate for very long.

      Cheers,

      Fred
      PS: been meaning to reply to this comment for a few days – pending the site recovering from an attack which took us offline for 3 days.

    1. They could be, yet these were just set up for examples. I have seen similar behavior when when different teams install equipment, one with faulty instructions. One group fails early, the other wears out nicely. Cheers, Fred

Leave a Reply

Your email address will not be published.