Category Archives: MTBF

Mean Time Between Failures or MTBF is a common metric for reliability and is often misused or misunderstood.

A World of Constant Failure Rates

14760970966_18c932956c_zWhat if all failures occurred truly randomly?

The math would be easier.

The exponential distribution would be the only time to failure distribution. We wouldn’t need Weibull or other complex multi parameter models. Knowing the failure rate for an hour would be all we would need to know, over any time frame.

Sample size and test planning would be simpler. Just run the samples at hand long enough to accumulated enough hours to provide a reasonable estimate for the failure rate.

Would the Design Process Change?

Yes, I suppose it would. The effects of early life and wear out would not exist. Once a product is placed into service the chance to fail the first hour would be the same as any hour of it’s operation. It would fail eventually and the chance of failing before a year would solely depend on the chance of failure per hour.

A higher failure rate would suggest it would have a lower chance of surviving very long. Although it could still fail in the first hour of use as if it had survived for one million hours and then it’s chance to fail the next hour would still be the same.

Would Warranty Make Sense?

Since by design we cannot create a product with a low initial failure rate we would only focus on the overall failure rate. Or the chance of failing over any hour, the first hour being convenient and easy to test, yet still meaningful. Any single failure in a customer’s hands could occur at any time and would not alone suggest the failure rate has changed.

Maybe a warranty would make sense based customer satisfaction. We could estimate the number of failures over a time period and set aside funds for warranty expenses. I suppose it would place a burden on the design team to create products with a lower failure rate per hour. Maybe warranty would still make sense.

How About Maintenance?

If there are no wear out mechanisms (this is a make believe world) changing the oil in your car would not make any economic sense. The existing oil has the same chance of engine seize failure as any new oil. The lubricant doesn’t breakdown. Seals do not leak. Metal on metal movement doesn’t cause damaging heat or abrasion.

You may have to replace a car tire due to a nail puncture, yet the chance of an accident due to worn tire tread would not occur any more often than with new tires. We wouldn’t need to monitor tire tread or break pad wear. Those wouldn’t occur.

If a motor is running now, if we know the failure rate we can calculate the chance of running for the rest of the shift, even when the motor is as old as the building.

The concepts of reliability centered maintenance or predictive maintenance or even preventative maintenance would not make sense. There would be advantage to swapping a part of a new one, as the chance to fail would remain the same.

Physics of Failure and Prognostic Health Management – would they make sense?

Understanding failure mechanisms so we could reduce the chance of failure would remain important. Yet when the failures do not

  • Accumulated damage
  • Drift
  • Wear
  • Abrade
  • Diffuse
  • Degrade
  • Etc.

Then many of the predictive power of PoF and PHM would not be relevant. We wouldn’t need sensors to monitor conditions that lead to failure, as no specific failure would show a sign or indication of failure before it occurred. Nothing would indicate it was about to fail as that would imply it’s chance to failure has changed.

No more tune-ups or inspections, we would pursue repairs when a failure occurs, not before.

A world of random failures, or a world of failures each of which occurs at a constant rate would be quite different than our world. So, why do we so often make this assumption?

What Does ‘Lifetime’ as a Metric Mean

14750331216_6c7a719566_oWhat Does ‘Lifetime’ as a Metric Mean

We talk about lifetimes of plants and animals. Also, you may talk about the lifetime of a product or system.

I expect to have safe and trouble free use of my car over its lifetime. Once in a while I find a warranty that says it is guaranteed over my lifetime — for as long as I own the blender, for example. Continue reading What Does ‘Lifetime’ as a Metric Mean

When Your Supplier Converts Reliability to MTBF

14598537229_fdbf335dac_zWhen Your Supplier Converts Reliability to MTBF

Oh, the trouble that will occur. The mistakes, mishaps and errors and most certainly the inability of the supplier to provide a reliability solution.

If you provide the supplier with a straightforward and complete reliability goal, and they convert it to an single number as an MTBF value, what really could go wrong? Also, why would the supplier degrade the requirement to an MTBF value? Continue reading When Your Supplier Converts Reliability to MTBF

What is MTBF?

14598527118_673c196c3c_zWhat is MTBF?

The acronym MTBF is commonly known in our field as Mean Time Between Failure.

It is also associated with repairable systems in most text books.

It is also denoted as the theta parameter for an exponential distribution.

It is referenced as a metric for reliability, too. Oh, and it is the inverse of the failure rate.

And, it is mis-understood and mis-used by many. I digress, as there is plenty already written on the perils of MTBF.

What is MTBF? And where and how should it be used, if at all? Continue reading What is MTBF?

Holiday Break and a Few Notes

Thank you

First off I want to say thanks to you the readers of the NoMTBF blog. The notes of thanks, of encouragement, and support all propel me to write to you each week.

I especially like the stories of success helping someone ‘get it’ concerning the common misunderstandings of MTBF.  I have to think your work and actions is making a difference across the field of reliability engineering. We’re making progress. Continue reading Holiday Break and a Few Notes

5 Things You Can Do Today to Avoid Using MTBF

14597503837_2511f1d075_oTake Action Today to Improve How Your Organization Talks About Reliability

You know the perils of MTBF use. The widespread misunderstanding and mis-use. You know about how MTBF treats your data poorly.

You also know everyone around you uses MTBF. Your industry uses MTBF. And, now one likes change, least of all about metrics concerning reliability.

As I said to a friend this morning, “The madness has to stop.”

And, you feel that say way. So, what are you going to do about it? Here are five things you can do today.

  1. Use the data to calculate reliability (probability of success) over a duration of interest along with calculating MTBF, then share the results.

  2. Encourage five of your colleagues to check out and subscribe to this site, www.nomtbf.com.

  3. Ask a vendor how they determined the MTBF value they are presenting on the data sheet? What evidence supports that claim and what assumptions are included (often unstated)?

  4. The next time you hear someone mention MTBF, ask them what do they mean? And, than ask what percentage of items should survive a year? If they are not consistent  — you found a learning opportunity.

  5. Write a blog post for the www.nomtbf.com site. What have you done to encourage better understanding of reliability concepts in your world? Share you hints, tips, stories, and advice here.

Pick one for today and do as many as you can. What would you add to this list? What kind responses are you receiving when you speak out about the perils of MTBF.

Keep up the effort. Together we are making progress. Thanks for the support.

How to Translate Customer Expectations About Reliability

14597486647_71b21d9d29_zHow to Translate Customer Expectations About Reliability

As a customer when I purchase a new car, a toaster, or a pump for my production line, I expect it to work. To Just Work. As a reliability professional, I also have the language to specify what I mean by, ‘just work’.

Customers that are not reliability engineers do not accurately specify what they mean by ‘it should just work’. So, we have to do a little extra to help translate that they want into specifications that we (manufacturer of the item) can create and deliver. Continue reading How to Translate Customer Expectations About Reliability

Popular Reliability Measures and Their Problems

Popular Reliability Measures and Their Problems

 

14597433337_8392823f80_zMTBF

Mean time between failure or mean time before failure is very common. The common definition describes MTBF as a reliability measure that is calculated by tallying operating hours and dividing by the number of failures. Intuitively this is the average time until a failure occurs. Mathematically it is the inverse of the failure rate. Generally used for repairable systems. Continue reading Popular Reliability Measures and Their Problems

What is Reliability?

14597362828_3146085044_zWhat is Reliability?

It’s not MTBF. It’s not just the period of time the product does not fail. It’s not just a probability.

It’s a bit more. Reliability is it ‘just works’.

HP calculators are reliable. They work and keep on working. Apparently Lexus makes reliable cars. (According to the current car rankings by Consumer Reports, 2015). My coffee maker is reliable.

The dictionary on my Mac says reliable is:

Screen Shot 2015-10-21 at 6.04.49 AM

And, according to O’Connor and Kleyner in Practical Reliability Engineering, 5th ed. Reliability is:

The probability that an item will perform a required function without failure under stated conditions for a stated period of time.

This is a definition we can use as engineers. It has four parts:

  • Function
  • Environment
  • Probability
  • Duration

And we certainly can define and measure each well.

BTW: MTBF is only probability (actually stated as an inverse failure rate), thus does not fully define reliability.

Consistent, trustworthy? Yes, a reliable product or system should process these essential qualities, too.

Reliability conjures many images and thoughts. The examples you envision are different than mine. That is fine. The concept remains the same. When an item is reliable, it just works. I like to add that it just keeps on working.

When setting goals, estimating, predicting, or measuring reliability, use all four element of the definition laid out by O’Connor and Kleyner. Be clear and complete. Keep it simple and make it reliable.

What comes to mind when you think of reliability? Leave a comment and share what you consider reliable.

The MTBF Conspiracy Theory

14597336038_02d825d7a5_zThe MTBF Conspiracy Theory

When my son was young he asked a lot of questions that were difficult to answer. For example:

  • Why is the sky blue?
  • Why do I have to go to school?
  • What is a conspiracy theory?

The first two were expected, yet the third set me back a little. How do you explain conspiracy theory to a 5th grader? The dictionary type definitions just seemed to confuse everyone. So, I made up a conspiracy theory.

I said, “Did you know, North Dakota, is not really a state?”

For those that haven’t heard of North Dakota, which on many maps is in the north central part of the US, that just reinforces the theory that it doesn’t exist.

My son, having recently memorized all fifty US states and their capital cities in school, said I was wrong and he even knew that was true as he still recalled the capital city name.

“Prove it.”, Was all I said in response.

“Well it’s on the map on the country as a state.” My reply included how maps change and are arbitrary. Anyone could have drawn the map, and how do we know it is accurate. Maybe the good folks in South Dakota paid the map maker to draw in the fictions state of North Dakota.

“It’s listed in Wikipedia!” And, my reply, was about how anyone can create a posting on the site, what is the proof it’s actually true? Have you ever seen a car with ND plates or meet someone from there?” He hadn’t.

My son knew I was only demonstrating the idea of a conspiracy theory. We had fun with it for years.

I was glad he never asked me,

“Why do people use MTBF?”

Just with the blue sky, a shrug and smile just wasn’t a good enough answer. There has to be a rational reasons people use MTBF.

After writing about perils of MTBF use for a few years, my current theory is it has to be a conspiracy.

The MTBF conspiracy theory revealed

Here’s what I think happened.

A bright engineer was tasked with estimating the reliability of a nuclear submarine’s electronics. He was given about a month to achieve this task, which is not enough time to conduct any testing. So, he gathered all the component failure rate data, tallied it up and reported the expected failure rate. {Parts count prediction}

The marketing department noticed the failure rate value and the word failure. The admission that the submarine might fail didn’t help to sell summaries, so they flipped the failure over, creating the average time between failure, or mean time between failures, MTBF.

The lower the failure rate the higher the MTBF went. Up was good. Failure is bad. {That’s how I think marketing folks think – sorry}

The engineers understood failure rates the math to create MTBF was pretty simple. So whatever, tis the same thing. Then management got involved.

The management team only wanted to read and talk about MTBF {again the word ‘failure ’  is bad thinking}. They set MTBF goals, they expected glowing reports of increasing MTBF values, and so on.

Then something really bad happened.

The US Military created a standard. And, a company used a computer to automate the standard’s estimate  of MTBF. Other’s did too. Now there was profit to be made by estimating MTBF, not reliability. So, they sold MTBF estimations. After all, that is what the management team wants, MTBF.

The military standard spawned many industry standards. The standards become parts of purchase contracts. MTBF flourished.

“What is your MTBF?” became an acceptable way to ask about reliability performance.

The murky bit of the theory involves why very few stood up to say, “Let’s not use MTBF, it is not very useful. Let’s use the probability of success over a duration (reliability) instead.” You may have said these very words or words to the same affect. And you felt the resistance.

  • We always use MTBF.
  • Everyone in our industry uses MTBF.
  • The vendor only provides MTBF values.

My theory is we all know better, {maybe not the marketing folks – sorry} and we just do feel able to overcome the resistance to change. We know we could do much better with better metrics, yet the backlash is unrelenting.

Just as that first engineer figured out a quick way to come up with a failure rate estimate, we too face the necessity to use MTBF. We do not have the time or energy to change our company or industry to stop using MTBF. So, we just do it.

It’s easy.

I don’t know if the spread of MTBF use is organized by a secret group or not. I suspect not. Yet the ease of use and avoidance of the word failure (or anything the smells like we would have to do statistics) conspired to trap us into using MTBF.

That’s my theory. If you know of any critical bits of information to support this theory, let me know. If we expose the conspiracy for what it is, it may just fade away. We then may get back to work doing reliability engineering and creating reliable products.

A not what you think MTBF course in Exeter UK

Just a quick note. A good friend forwarded me a course listing with the note, “This is not the traditional MTBF course, despite the title.” or something like that.

So, looked into the course offered by The MIRCE Akademy, titled Mean Time Between Failures — MTBF: Scientific method for the accurate predictions of Mean Time Between Failures.

Looks interesting.

So, this may be a good one to take. You can find more information at the course page.

If you have or plan on taking the course, please leave a comment here to let others know how it went and what you learned.

 

The Convenient Use of MTBF

14597288639_27e0622088_zThe Convenient Use of MTBF

Sometimes making an assumption is a good thing. You can achieve more with less. A well placed assumption saves you time, work, and worry. The right assumption may even be left unstated, it’s so good.

Have you ever assumed the failures for a system follow an exponential distribution? Did you assume tallying up the total hours and dividing by the number of failures was appropriate? Did you even check? (You don’t need to answer.) Continue reading The Convenient Use of MTBF

Just Because the Customer Requests MTBF

14597317110_9351de5a39_zJust Because the Customer Requests MTBF

Is that justification to use MTBF?

No.

It’s not. In this case the customer is probably not asking for MTBF, what they most like want to know is something meaningful about the expected reliability performance of the item in question. They want to know if what they will or did purchase will last as long as they expect. Continue reading Just Because the Customer Requests MTBF

The Constant Failure Rate Myth

14597315009_8dec5d425e_zThe Constant Failure Rate Myth

Have you said or have you heard someone say,

  • “Let’s assume it’s in the flat part of the curve”
  • “Assuming constant failure rate…”
  • “We can use the exponential distribution because we are in the useful life period.”

Or something similar? Did you cringe? Well, you should have. Continue reading The Constant Failure Rate Myth

MTBF is a Statistic, Not the Only One

MTBF is a Statistic, Not the Only One

14586955417_94ef84b055_zWe often face just a sample of life data with the request of estimating the reliability of the system. Or, we have a touch of test results and want to know if the product is reliability enough, yet. Or, we gather repair times to grapple with spares stocking.

We need to know the reliability. We need to know the number.

MTBF (or close cousin MTTF) is just that number. It is easy to calculate. A higher number means the system is more reliable. And, the metrics are in the units of time, often hours, which is easy to understand (and misunderstand).

In early chapters of reliability engineering books, or in introduction to reliability, we learn about the exponential distribution and the population parameter, theta. We also learn about the sample statistics which provides an unbiased estimated for the population parameter. In both cases, MTBF, or the mean time between failure, is the one value we have to master.

Other Statistics

Reliability is pretty easy using just one statistic. One calculation, one number, and we’re done.

Then a couple of things start to happen.

First, we notice that the actual time to failure behavior is not predicted, nor follows, the expected pattern when using just MTBF and the exponential distribution. The average time to fail changes as the system ages. We find that we run out of spares based on calculations using MTBF as the parts fail more and more often.

Second, we learn just a little more. We turn the page in the book or attend another webinar. We hear about another distribution commonly used in reliability engineering. The Weibull distribution. But, wait, hold on there. The Weibull has two and sometimes three parameters. I’ll need to learn about plotting, censored data, regression analysis, goodness of fit, confidence intervals, and a bunch of statistical methods.

Life was good with just one statistic.

We didn’t sign up to be reliability statisticians.

Well, too bad.

Actually, when using even just the one statistic, MTBF, we also should have been

  • Checking assumptions
  • Fitting the data to the exponential distribution function
  • Evaluating the goodness of fit
  • Calculating confidence bounds
  • And, using those other statistical methods

In order to understand and use our sparse and expensive datasets, we need to use the tools found in the statistics textbooks.

Yes, the Weibull distribution has two or three parameters, thus we need to evaluate how well our statistics describe the data in a more rigorous way. And, we learn so much more. For example, we can model and predict a system with decreasing or increasing failure rates over time. We can estimate the number of required spares next year with a bit more accuracy then using just MTBF.

There are more benefits. Have you advanced past the basic introduction and embraced the use of reliability statistics? How’s it going and what challenges are you facing?