All posts by Fred Schenkelberg

About Fred Schenkelberg

I am an experienced reliability engineering and management consultant with my firm FMS Reliability. My passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs.

How to Translate Customer Expectations About Reliability

14597486647_71b21d9d29_zHow to Translate Customer Expectations About Reliability

As a customer when I purchase a new car, a toaster, or a pump for my production line, I expect it to work. To Just Work. As a reliability professional, I also have the language to specify what I mean by, ‘just work’.

Customers that are not reliability engineers do not accurately specify what they mean by ‘it should just work’. So, we have to do a little extra to help translate that they want into specifications that we (manufacturer of the item) can create and deliver. Continue reading How to Translate Customer Expectations About Reliability

Is Reliability Just Testing?

Is Reliability Just Testing?

14597483807_4d45aa3e9e_oI endured a difficult conversation with a project manager yesterday. The meeting agenda included an initial discussion about the product development reliability plan. She agreed that we needed to identify risks and provide feedback to the team concerning product reliability. Continue reading Is Reliability Just Testing?

Popular Reliability Measures and Their Problems

Popular Reliability Measures and Their Problems

 

14597433337_8392823f80_zMTBF

Mean time between failure or mean time before failure is very common. The common definition describes MTBF as a reliability measure that is calculated by tallying operating hours and dividing by the number of failures. Intuitively this is the average time until a failure occurs. Mathematically it is the inverse of the failure rate. Generally used for repairable systems. Continue reading Popular Reliability Measures and Their Problems

What is Reliability?

14597362828_3146085044_zWhat is Reliability?

It’s not MTBF. It’s not just the period of time the product does not fail. It’s not just a probability.

It’s a bit more. Reliability is it ‘just works’.

HP calculators are reliable. They work and keep on working. Apparently Lexus makes reliable cars. (According to the current car rankings by Consumer Reports, 2015). My coffee maker is reliable.

The dictionary on my Mac says reliable is:

Screen Shot 2015-10-21 at 6.04.49 AM

And, according to O’Connor and Kleyner in Practical Reliability Engineering, 5th ed. Reliability is:

The probability that an item will perform a required function without failure under stated conditions for a stated period of time.

This is a definition we can use as engineers. It has four parts:

  • Function
  • Environment
  • Probability
  • Duration

And we certainly can define and measure each well.

BTW: MTBF is only probability (actually stated as an inverse failure rate), thus does not fully define reliability.

Consistent, trustworthy? Yes, a reliable product or system should process these essential qualities, too.

Reliability conjures many images and thoughts. The examples you envision are different than mine. That is fine. The concept remains the same. When an item is reliable, it just works. I like to add that it just keeps on working.

When setting goals, estimating, predicting, or measuring reliability, use all four element of the definition laid out by O’Connor and Kleyner. Be clear and complete. Keep it simple and make it reliable.

What comes to mind when you think of reliability? Leave a comment and share what you consider reliable.

The MTBF Conspiracy Theory

14597336038_02d825d7a5_zThe MTBF Conspiracy Theory

When my son was young he asked a lot of questions that were difficult to answer. For example:

  • Why is the sky blue?
  • Why do I have to go to school?
  • What is a conspiracy theory?

The first two were expected, yet the third set me back a little. How do you explain conspiracy theory to a 5th grader? The dictionary type definitions just seemed to confuse everyone. So, I made up a conspiracy theory.

I said, “Did you know, North Dakota, is not really a state?”

For those that haven’t heard of North Dakota, which on many maps is in the north central part of the US, that just reinforces the theory that it doesn’t exist.

My son, having recently memorized all fifty US states and their capital cities in school, said I was wrong and he even knew that was true as he still recalled the capital city name.

“Prove it.”, Was all I said in response.

“Well it’s on the map on the country as a state.” My reply included how maps change and are arbitrary. Anyone could have drawn the map, and how do we know it is accurate. Maybe the good folks in South Dakota paid the map maker to draw in the fictions state of North Dakota.

“It’s listed in Wikipedia!” And, my reply, was about how anyone can create a posting on the site, what is the proof it’s actually true? Have you ever seen a car with ND plates or meet someone from there?” He hadn’t.

My son knew I was only demonstrating the idea of a conspiracy theory. We had fun with it for years.

I was glad he never asked me,

“Why do people use MTBF?”

Just with the blue sky, a shrug and smile just wasn’t a good enough answer. There has to be a rational reasons people use MTBF.

After writing about perils of MTBF use for a few years, my current theory is it has to be a conspiracy.

The MTBF conspiracy theory revealed

Here’s what I think happened.

A bright engineer was tasked with estimating the reliability of a nuclear submarine’s electronics. He was given about a month to achieve this task, which is not enough time to conduct any testing. So, he gathered all the component failure rate data, tallied it up and reported the expected failure rate. {Parts count prediction}

The marketing department noticed the failure rate value and the word failure. The admission that the submarine might fail didn’t help to sell summaries, so they flipped the failure over, creating the average time between failure, or mean time between failures, MTBF.

The lower the failure rate the higher the MTBF went. Up was good. Failure is bad. {That’s how I think marketing folks think – sorry}

The engineers understood failure rates the math to create MTBF was pretty simple. So whatever, tis the same thing. Then management got involved.

The management team only wanted to read and talk about MTBF {again the word ‘failure ’  is bad thinking}. They set MTBF goals, they expected glowing reports of increasing MTBF values, and so on.

Then something really bad happened.

The US Military created a standard. And, a company used a computer to automate the standard’s estimate  of MTBF. Other’s did too. Now there was profit to be made by estimating MTBF, not reliability. So, they sold MTBF estimations. After all, that is what the management team wants, MTBF.

The military standard spawned many industry standards. The standards become parts of purchase contracts. MTBF flourished.

“What is your MTBF?” became an acceptable way to ask about reliability performance.

The murky bit of the theory involves why very few stood up to say, “Let’s not use MTBF, it is not very useful. Let’s use the probability of success over a duration (reliability) instead.” You may have said these very words or words to the same affect. And you felt the resistance.

  • We always use MTBF.
  • Everyone in our industry uses MTBF.
  • The vendor only provides MTBF values.

My theory is we all know better, {maybe not the marketing folks – sorry} and we just do feel able to overcome the resistance to change. We know we could do much better with better metrics, yet the backlash is unrelenting.

Just as that first engineer figured out a quick way to come up with a failure rate estimate, we too face the necessity to use MTBF. We do not have the time or energy to change our company or industry to stop using MTBF. So, we just do it.

It’s easy.

I don’t know if the spread of MTBF use is organized by a secret group or not. I suspect not. Yet the ease of use and avoidance of the word failure (or anything the smells like we would have to do statistics) conspired to trap us into using MTBF.

That’s my theory. If you know of any critical bits of information to support this theory, let me know. If we expose the conspiracy for what it is, it may just fade away. We then may get back to work doing reliability engineering and creating reliable products.

Does a Certification Make You a Professional Reliability Engineer?

14597317110_da115cce9c_oDoes a Certification Make You a Professional Reliability Engineer?

No, it doesn’t.

It’s just a piece of paper that conveys you mastered some body of knowledge. You most likely also committed to abide by a code of ethics. Plus you may have committed to continuing eductions to maintain the certification.

Having a certification means you know the terms, definitions, techniques and concepts concerning reliability engineering. Thanks all.

Does it mean you are a professional? No.

Being Professional

The dictionary describes professional as being associated or involved with a profession. You are professional by working or studying the profession of reliability engineering. Yet, we commonly consider a professional as being more than just a person with a job title.

A professional, in my mind exemplifies the essence of a noble, caring, capable engineer. One that works for the greater good. Someone the strives to make the world a better place. (Insert pedestal here.)

This is the nature of the engineering code of ethics that professional societies draft and encourage members to live. The following are just examples of the many similar codes that exist:

American Society for Quality Code of Ethics

http://asq.org/about-asq/who-we-are/ethics.html

National Society of Professional Engineers Code of Ethics

http://www.nspe.org/resources/ethics/code-ethics

Institute of Electrical and Electronics Engineers Code of Ethics

http://www.ieee.org/about/corporate/governance/p7-8.html

There are many others and they are all similar. Be honest, forthright and fair in your work.

You probably already adhere to these various codes of ethics. You do not have to pay membership dues to demonstrate you are ethical. It’s how you work, behave and conduct your life.

You are a professional reliability engineer by way you solve problems, continue to learn, assist others willingly, and exemplify how the reliability engineering profession makes the world a better place.

Certifications are Good, too.

There are different types of certifications and many organization offer certificates. For reliability engineering there are three professional societies that I know about that offer certifications.

American Society for Quality Certified Reliability Engineer

http://asq.org/cert/reliability-engineer

Society for Maintenance and Reliability Professional Certified Maintenance & Reliability Professional

http://www.smrp.org/i4a/pages/index.cfm?pageid=3578

Association for maintenance Professionals Certified Reliability Leader

http://www.maintenance.org/pages/crl

Some engineers have all three certifications. Some only one. Many professional engineers do not have any certification. It’s a personal decision. You can strive to work as a professional with or without securing one or more of the certifications offered by professional societies.

I should mention there are many other certifications offered in our industry. Conferences, software companies and consulting & training organizations offer certifications.  These like the ones offered by professional society are not licenses (state license or charter). The various certifications simply mean the person meet some level of experience, course work, demonstrated body of work or passed a test.

It doesn’t mean they are a professional.

If you are pursuing a certification, why? Please add a comment on what certification means to you and your career.

A not what you think MTBF course in Exeter UK

Just a quick note. A good friend forwarded me a course listing with the note, “This is not the traditional MTBF course, despite the title.” or something like that.

So, looked into the course offered by The MIRCE Akademy, titled Mean Time Between Failures — MTBF: Scientific method for the accurate predictions of Mean Time Between Failures.

Looks interesting.

So, this may be a good one to take. You can find more information at the course page.

If you have or plan on taking the course, please leave a comment here to let others know how it went and what you learned.

 

The Convenient Use of MTBF

14597288639_27e0622088_zThe Convenient Use of MTBF

Sometimes making an assumption is a good thing. You can achieve more with less. A well placed assumption saves you time, work, and worry. The right assumption may even be left unstated, it’s so good.

Have you ever assumed the failures for a system follow an exponential distribution? Did you assume tallying up the total hours and dividing by the number of failures was appropriate? Did you even check? (You don’t need to answer.) Continue reading The Convenient Use of MTBF

Announcing Reliability.fm a podcast network

A break from the normal format.

Reliability.fm

A Reliability Engineering Podcast Network

Reliability.fm.logo_300x300Reliability.fm is a podcast network focused on reliability engineering topics.

Starting with three shows. Speaking of Reliability and Dare to Know described below, along with recorded Accendo Reliability webinars.

If you would like to create your own reliability focused podcast, lets talk about getting you started. Contact Fred Schenkelberg at fms@accendoreliability.com


 

speaking_of_reliability_2015_250x250Speaking of Reliability  a podcast of good friends sitting down with you to talk about reliability engineering.

A new reliability engineering focused podcast now available on iTunes. Give it a listen and please leave a rating and review.

The intent is to publish two times a week. The conversations format is inspired by a chance lunch with two young engineers new to reliability engineering. The questions they asked and our conversation helped them get started and improve their programs. So, let us know your questions.


 

Dare-to-know_250x250

Dare to Know: Interviews with Quality and Reliability Thought Leaders with Host Tim Rodgers.

Meet the people that shape our profession. Authors, bloggers, consultants, scholars, business leaders. Learn about their insights and motivations.

Now available on iTunes. Please leave a rating and review. Help other find this new podcast.

 


Enjoy the shows and contact us with any ideas or thought leaders you want us to engage for a future show.

 

Just Because the Customer Requests MTBF

14597317110_9351de5a39_zJust Because the Customer Requests MTBF

Is that justification to use MTBF?

No.

It’s not. In this case the customer is probably not asking for MTBF, what they most like want to know is something meaningful about the expected reliability performance of the item in question. They want to know if what they will or did purchase will last as long as they expect. Continue reading Just Because the Customer Requests MTBF

The Constant Failure Rate Myth

14597315009_8dec5d425e_zThe Constant Failure Rate Myth

Have you said or have you heard someone say,

  • “Let’s assume it’s in the flat part of the curve”
  • “Assuming constant failure rate…”
  • “We can use the exponential distribution because we are in the useful life period.”

Or something similar? Did you cringe? Well, you should have. Continue reading The Constant Failure Rate Myth

MTBF is a Statistic, Not the Only One

MTBF is a Statistic, Not the Only One

14586955417_94ef84b055_zWe often face just a sample of life data with the request of estimating the reliability of the system. Or, we have a touch of test results and want to know if the product is reliability enough, yet. Or, we gather repair times to grapple with spares stocking.

We need to know the reliability. We need to know the number.

MTBF (or close cousin MTTF) is just that number. It is easy to calculate. A higher number means the system is more reliable. And, the metrics are in the units of time, often hours, which is easy to understand (and misunderstand).

In early chapters of reliability engineering books, or in introduction to reliability, we learn about the exponential distribution and the population parameter, theta. We also learn about the sample statistics which provides an unbiased estimated for the population parameter. In both cases, MTBF, or the mean time between failure, is the one value we have to master.

Other Statistics

Reliability is pretty easy using just one statistic. One calculation, one number, and we’re done.

Then a couple of things start to happen.

First, we notice that the actual time to failure behavior is not predicted, nor follows, the expected pattern when using just MTBF and the exponential distribution. The average time to fail changes as the system ages. We find that we run out of spares based on calculations using MTBF as the parts fail more and more often.

Second, we learn just a little more. We turn the page in the book or attend another webinar. We hear about another distribution commonly used in reliability engineering. The Weibull distribution. But, wait, hold on there. The Weibull has two and sometimes three parameters. I’ll need to learn about plotting, censored data, regression analysis, goodness of fit, confidence intervals, and a bunch of statistical methods.

Life was good with just one statistic.

We didn’t sign up to be reliability statisticians.

Well, too bad.

Actually, when using even just the one statistic, MTBF, we also should have been

  • Checking assumptions
  • Fitting the data to the exponential distribution function
  • Evaluating the goodness of fit
  • Calculating confidence bounds
  • And, using those other statistical methods

In order to understand and use our sparse and expensive datasets, we need to use the tools found in the statistics textbooks.

Yes, the Weibull distribution has two or three parameters, thus we need to evaluate how well our statistics describe the data in a more rigorous way. And, we learn so much more. For example, we can model and predict a system with decreasing or increasing failure rates over time. We can estimate the number of required spares next year with a bit more accuracy then using just MTBF.

There are more benefits. Have you advanced past the basic introduction and embraced the use of reliability statistics? How’s it going and what challenges are you facing?

Plot the Data

Plot the Data

Just, please, plot the data.

If you have gathered some time to failure data. You have the breakdown dates for a piece of equipment. You review your car maintenance records and notes the dates of repairs. You may have some data from field returns. You have a group of numbers and you need to make some sense of it.

Take the average

That seems like a great first step. Let’s just summarize the data in some fashion. So, let’s day I have the number of hours each fan motor ran before failure. I can tally up the hours, TT, and divide by the number of failures, r. This is the mean time to failure.

\displaystyle \theta =\frac{TT}{r}

Or, if the data was one my car and I have the days between failures, I can also tally up the time, TT, and divide by the number of repairs, r. Same formula and we call the result, the mean time between failure.

And I have a number. Say it’s 34,860 hours MTBF. What does that mean (no pun intended) other than on average my car operated for 34k hours between failures. Sometimes more, sometimes less.

Any pattern? Is my car getting better with age, or worse?

A Histogram

In school we used to use histograms to display the data. Let’s try that. Here’s an example plot.

 

Screen Shot 2015-08-05 at 8.01.58 AM In this case the plot is of service and repair times (most likely similar to the times the garage has my car for a oil change and tune up). Right away we see more than just a number. The values range from about 50 up to about 350 with most of the data on the lower side. Just a couple of service times take over 250 minutes.

Using just an average doesn’t provide very much information compared to a histogram.

Mean Cumulative Function Plot

Over time count the number of failures. If the repair time is short compared to operating time, than this simple plot may reveal interesting patterns that a histogram cannot.

Here is a piece of equipment and each dot represented a call for service. The x-axis is time and the vertical axis is the count of service calls. While it’s not clear what happened shortly after about 3,000 hours, it may be worth learning more about what was going on then.

M90-P4 MCF

 

Even after the first there or four point after 3,000 hours would have signaled something different is happening here.
MCF plots show when something is getting worse (more frequent repairs) by curving upward, or getting better, (longer spans between repairs) by flattening out. Again, a lot more information than with just a number.

Plot the Fitted Distribution

Let’s say we really want to assume the data is from an exponential distribution. We can happily calculate the MTBF value and continue with the day. Or, we can plot the data and the fitted exponential distribution.

Let’s say we have about five failure times based on customer returns out of the 100 units placed into service. We can calculate the MTBF value including the time the remaining 95 units operated, which is about 172,572 hours MTBF. And, we can plot the data, too.

Here’s an example. What do you notice, even with a fuzzy plot image?

Exp assumed plot

 

The line intersects the point where the F(t) is 0.63 or about the 63rd percentile of the distribution, and the time is at the point we calculated as the MTBF value (off to the right of the plot area).
Like me, you may notice the line doesn’t seem to describe the data very well. It seems to have a different pattern than that described by the exponential distribution. Let’s add a fit of a Weibull distribution that also was fit to the data, including the units that have not failed.

 

W v E plot

The Weibull fit at least appears to represent the pattern of the failures. The slope is much steeper than the exponential fit. The Weibull tells a different story. A story that represents the story within the data.

Again, just plot the data. Let the data show you what it has to say. What does your data say today?

Speaking of Reliability — a new podcast series

Coming Soon

Speaking of Reliability

speaking_of_reliability_2015_250x250A new podcast show featuring discussions with reliability experts about a wide range of reliability engineering topics is in the process of development. We’ve recorded a few episodes and in editing now.

I expect to launch the podcast in the next week or two.

The show is in large part based on the questions received over the past few years from you.    You being reliability minded folks that would like to solve problems, improve reliability performance and advance your career.  Continue reading Speaking of Reliability — a new podcast series

The Fear of Reliability Statistics

The Fear of
Reliability Statistics

Eva the Weaver Soon deniable
Eva the Weaver
Soon deniable

When reading a report and there is a large complex formula, maybe a derivation, do you just skip over it? Does a phrase, 95% confidence of 98% reliability over 2 years, not help your understanding of the result?

Hypothesis testing, confidence intervals, point estimates, parameters, independent identically distributed, random sample, orthogonal array, …

Did you just shiver a bit?  Continue reading The Fear of Reliability Statistics