At first MTBF seems like a commonly used and useful measure of reliability. Trained as a statistician and understanding the use of the expected value that MTBF represented, I thought, ‘cool, this is useful’.
Then the discussions with engineers, technical sales folks and other professionals about reliability using MTBF started. And the awareness that not everyone, and at times it seems very few, truly understood MTBF and how to properly use the measure.
Continue reading First Impressions
Consider the Decision Making First
Reliability activities serve one purpose, to support better decision making.
That is all it does. Reliability work may reveal design weaknesses, which we can decide to address. Reliability work may estimate the longevity of a device, allowing decisions when compared to objectives for reliability.
Creating a report that no one reads is not the purpose of reliability. Running a test or analysis to simply ‘do reliability’ is not helpful to anyone. Anything with MTBF involved … well, you know how I feel about that. Continue reading Consider the Decision Making First
What is Wrong With Success Testing?
Three prototypes survive the gauntlet of stresses and none fail. That is great news, or is it? No failure testing is what I call success testing.
We often want to create a design that is successful, therefore enjoying successful testing results, I.e. No failures means we are successful, right?
Another aspect of success testing is in pass/fail type testing we can minimize the sample size by planning for all prototypes passing the test. If we plan on running the test till we have a failure or two, we need more samples. While it improves the statistics of the results, we have to spend more to achieve the results. We nearly always have limited resources for testing.
Let’s take a closer look at success testing and some of the issues you should consider before planning your next success test. Continue reading What is Wrong With Success Testing?
An Elusive Product Life Time Definition
The following note and question appear in my email the other day. I had given the definition of reliability quite a bit of thought, yet have not really thought too much about a definition of ‘product life time’.
So after answering Najib’s question I thought it may make a good conversation starter here. Give it a quite read, and add how you would answer the questions Najib poses. Continue reading Defining a Product Life Time
MTBF is Not a Duration
Despite standing for the time between failures, MTBF does not represent a duration. Despite having units of hours (months, cycles, etc.) is it not a duration related metric.
This little misunderstanding seems to cause major problems. Continue reading MTBF is Not a Duration
The Fear of Reliability
MTBF is a symptom of a bigger problem. It is possibly a lack of interest in reliability. Which I doubt is the case. Or it is a bit of fear of reliability.
Many shy away from the statistics involved. Some simply do not want to know the currently unknown. It could be the fear of potential bad news that the design isn’t reliable enough. Some do not care to know about problems that will requiring solving.
What ever the source of the uneasiness, you may know one or more coworkers that would rather not deal with reliability in any direct manner. Continue reading The Fear of Reliability
What Does Being In The Flat Part of the Curve Mean?
To mean it means very little, as it rarely occurs. Products fail for a wide range of reasons and each failure follows it’s own path to failure.
As you may understand, some failures tend to occur early, some later. Some we call early life failures, out-of-box failures, etc. Some we deem end of life or wear out failures. There are a few that are truly random in nature, just as a drop or accident causing an overstress fracture, for example. Continue reading Being In The Flat Part of the Curve
A Series of Unfortunate MTBF Assumptions
The calculation of MTBF results in a larger number if we make a series of MTBF assumptions. We just need more time in the operating hours and fewer failures in the count of failures.
While we really want to understand the reliability performance of field units, we often make a series of small assumptions that impact the accuracy of MTBF estimates.
Here are just a few of these MTBF assumptions that I’ve seen and in some cases nearly all of them with one team. Reliability data has useful information is we gather and treat it well. Continue reading A Series of Unfortunate MTBF Assumptions
It is Time to Update the Reliability Metric Book with Your Help
Let’s think of this as a crowdsourced project. The first version of this book is a compilation of NoMTBF.com articles. It lays out why we do not want to use MTBF and what to do instead (to some extent).
With your input of success stories, how to make progress using better metrics, and input of examples, stories, case studies, etc. the next version of the book will be much better and much more practical. Continue reading Time to Update the Reliability Metric Book
We Need to Try Harder to Avoid MTBF
Just back from the Reliability and Maintainability Symposium and not happy. While there are signs, a proudly worn button, regular mentions of progress and support, we still talk about reliability using MTBF too often. We need to avoid MTBF actively, no, I mean aggressively.
Let’s get the message out there concerning the folly of using MTBF as a surrogate to discuss reliability. We need to work relentlessly to avoid MTBF in all occasions.
Teaching reliability statistics does not require the teaching of MTBF.
Describing product reliability performance does not benefit by using MTBF.
Creating reliability predictions that create MTBF values doesn’t make sense in most if not all cases. Continue reading We Need to Try Harder to Avoid MTBF
3 Ways to Expose MTBF Problems
MTBF use and thinking is still rampant. It affects how our peers and colleagues approach solving problems.
There is a full range of problems that come from using MTBF, yet how do you spot the signs of MTBF thinking even when MTBF is not mentioned? Let’s explore there approaches that you can use to ferret out MTBF thinking and move your organization toward making informed decisions concerning reliability. Continue reading 3 Ways to Expose MTBF Problems
The Army Memo to Stop Using Mil HDBK 217
Over 20 years ago the Assistant Secretary of the Army directed the Army to not use MIL HBK 217 in a request for proposals, even for guidance. Exceptions, by waiver only.
217 is still around and routinely called out. That is a lot of waivers.
Why is 217 and other parts count database prediction packages still in use? Let’s explore the memo a bit more, plus ponder what is maintaining the popularity of 217 and ilk. Continue reading The Army Memo to Stop Using Mil HDBK 217
Why do we use ReliaSoft instead of JMP to Identify the Time to Failure?
This is a question someone posted to Quora and the system prompted me to answer it, which I did.
This question is part of the general question around which software tools do you use for specific situations. First, my response to the question. Continue reading Why do we use Weibull++ over JMP?
Futility of Using MTBF to Design an ALT
Let’s say we want to characterize the reliability performance of a vendor’s device. We’re considering including the device within our system, if and only if, it will survive 5 years reasonably well.
The vendor’s data sheet lists an MTBF value of 200,000 hours. A call to the vendor and search of their site doesn’t reveal any additional reliability information. MTBF is all we have.
We don’t trust it. Which is wise.
Now we want to run an ALT to estimate a time to failure distribution for the device. The intent is to use an acceleration model to accelerate the testing and a time to failure model to adjust to our various expected use conditions.
Given the device, a small interface module with a few buttons, electronics, a display and enclosure, and the data sheet with MTBF, how can we design a meaningful ALT? Continue reading Futility of Using MTBF to Design an ALT
Two Ways to Think and Talk about Reliability
Neither includes using MTBF, btw.
And, I’m not thinking about the common language definition either.
Plus, I may have this all wrong. Here is the way I think about the reliability of something. More than ‘it should just work’ and different than ‘one can count on it to start’. When I ask someone how reliable a product is, this is what I mean.
By explaining my basic understanding we can compare notes. It is possible, quite possible, that I will learn something. As you may as well. Let’s see. Continue reading Two Ways to Think and Talk about Reliability
The Damage Done by Drenick’s Theorem
Have you ever wondered by we use the assumption of a constant failure rate? Or considered why we assume our system is ‘in the flat part of the curve [bathtub curve]’?
Where did this silliness first arise?
In part, I lay blame on Mil Hdbk 217 and parts count prediction practices. Yet, there is a theoretical support for the notion that for large, complex systems the overall system time to failure will approach an exponential distribution.
Thanks go to Wally Tubell Jr., a professor of systems engineering and test. He recently sent me his analysis of Drenick’s theorem and it’s connection to the notion of a flat section of a bathtub curve.
Wally did a little research and found the theorem lacking for practical use. I agree and will explain below. Continue reading The Damage Done by Drenick’s Theorem