How to Calculate MTBF

Considerations When You Calculate MTBF

You should calculate MTBF for machines tooIt is deceptively easy to calculate MTBF given a count of failure and an estimate of operating hours. Just tally up the total hours the various systems operate and divide by the number of failures. Easy.

This simple calculation is the unbiased estimator for the inverse of the parameter lambda for the exponential distribution, or directly to estimate theta (MTBF). We use theta to represent the 1 / lambda.

What could go wrong with such a simple calculation?

What is a failure?

Let’s start with what we count or do not count as a failure. This directly changes the resulting MTBF value. If we only count confirmed hardware failures, and do not count intermittent or unreproducible or software failures, are we under counting what the customer experiences as a failure?

Over what duration do we count the failures? Should we focus only on the first month of operation, the first year, the warranty or service contract period or the entire operating life of the system? How do you calculate MTBF?

Some organizations only count failures they expect to occur. The unexpected ones are ‘special’ causes and require further study before counting as failure officially.

Another organization only counted failures that completely shut down the system. A partial loss of functionality, a degradation of capability or the failure of a redundant element all did not count a system failure.

In my opinion if the customer calls it a failure, it’s a failure. If a failure, by any definition, costs your organization time and money to address, acknowledge, resolve or repair, it’s a failure.

What is operating time?

This one is tricky. If the system does include the appropriate sensors and tracking mechanisms (hour meter) and a way to gather that operating time of units both failed and still operating, then we have a pretty good way to track total operating hours. Some situations and systems make this easy.

Most do not.

Let’s say we ship 100 systems a month for 10 months. At the end of ten months the first shipments have accumulated 10 months of operating time. IF….

… They are all placed into service immediately

… They are all operated full time for the full 10 months

… They are have each failure reported including down time

In general, we do have to make a few assumptions to determine the operating time for shipped systems. We tend to be conservative and err on the side that would make the MTBF value a little smaller than if we had the full set of carefully tracked data. Or do we?

  • Some organization count from date/time of shipment ignoring shipping and installation time.
  • Some organization assume all systems are installed and operated 24/7.
  • Some organization assume no news is good news and the systems with no information are still operating.

And a few organization assume systems run indefinitely, even systems 20 years old, unless notified that it is decommissioned, assume it is still running full tilt. i.e. No retirement or replacement policy.

How about when you calculate MTBF?

By convention when there are no failures we assume in the next instant there will be one failure. This avoid dividing by zero which causes fits for calculators and spreadsheets and mathematicians.

Another issue is how often are the calculations made? Do we gather data hourly, daily, weekly, monthly, annually? Some use a rolling set of data, for example only units shipped in the last year count for both operating time and failures. This result will ignore or discount the longer term wear out failures as the bulk of the units are young.

Some organization do the calculations weekly in order to detect trends. If there are trends you probably should not be using MTBF…. If it’s changing, if there are early life or wear out failure mechanisms, you should not be using MTBF.

Even though you can calculate MTBF easily, the complexities of getting it right still do not provide a useful metric. Instead focus on getting better data including time to failure information so you can explore and report the data with other tools and methods. Treat the data appropriately and make better decisions

Sure, better data will improve the ability to calculate the MTBF value, if you’d like to be like some organizations, that is fine.

How have you seen MTBF calculated poorly? Share your thoughts and stories in the comments below.

Related:

How to calculate MTTF

Perils of using MTBF

About Fred Schenkelberg

I am an experienced reliability engineering and management consultant with my firm FMS Reliability. My passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs.

6 thoughts on “How to Calculate MTBF

  1. We calculate our MTBF / MTTF using only the confirmed failures, since those most often represent the ones most likely to be design controllable. That said, we and our customer also monitor the MTBUR (Mean time between unscheduled removals) which reflects the pain the end user feels, including installation and handling damage, the NFF count, and other causes that may need attention but aren’t strictly the functional failures. Thanks to more and better data collection gong on in the commercial aero industry, better fleet utilization data is available, so we are using Weibull analysis to calculate MTTF, since the beta is a better indicator of where to go looking for cause or causes, and the shape of the plotted data can also hold some clues.

    1. Hi Kevin,
      thanks for the note. So, if a failure is not confirmed it is assumed out of the design’s ability to control? Doesn’t that leave a large area for intermittent failures to lurk, which often are directly able to be designed out if desired?

      And, if using Weibull analysis, why bother to calculate MTTF, why strip your data of information going from a Weibull CDF for example, or % surviving so many airmails, to MTTF… seems a waste of good data.

      Cheers,

      Fred

      1. We’re on the same path, I just didn’t put the full story into first message. I can’t count failures if I don’t know what failed. In many cases, NFF’s get additional scrutiny like a run thru production ESS, a hot soak or a vibe test to try and get the failure to recur. We also recognize that sometimes troubleshooting at next level is a shotgun approach and good parts are removed so NFF is not always an intermittent or lurking condition, it’s an expected outcome when the part gets back to us.
        As far as why I calculate a MTTR from Weibull, it’s because it’s what was expected or requested, not because I like it any more than you. That’s why I also report a time to 1% failures along with it, because that’s really what they wanted to know, they just didn’t know to ask for it.
        Cheers,
        Kevin

  2. how to find the MTBF value for Hour meter( Part.no: 20018). otherwise give me Equivalent formula for find MTBF for Hour meter

    1. First off, do not use MTBF to describe the reliability of the part. You can ask the vendor, yet better to ask for reliability information instead. You can calculate MTBF by dividing the total time by the number of failures… which is typically not very useful.

      Cheers,

      Fred

Leave a Reply

Your email address will not be published. Required fields are marked *