Please don’t remove MTBF, part 2
This note is the second part of my response to a forum entry by HL concerning two arguments he is attempting to refute. Of course, my arguments for the eradication of MTBF may stir up some resistance. My plea to use a better approach may challenge the status quo or ruffle a few feathers. So be it. That is expected.
Change has to happen and airing contrary opinions and arguments is part of the process. If you see a flaw in my rationale (some may say I’m not rational – which is different) then please let me and others know. Let’s sort out the right best path forward, to increase our professional ability to talk about and measure reliability.
“MTBF is often misused”
The basic argument that MTBF is often misused led me to start this campaign to eradicate the use of MTBF. It was one too many vendors claiming MTBF and meaning a failure free period of time. It was one too many colleagues exasperated at trying to explain MTBF to a design engineer. It was one too many million dollar mistakes make due the use of MTBF. I had enough. Something had to be done.
Imagine you’re about to start a technical talk to peers at a reliability conference and a good friend and colleague asks, “What works to quickly explain what MTBF is and is not?” Image that you know a few in the audience that have shared similar frustrations with you. Imagine that you then ask, out of curiosity, how many of the 150 in attendance have explained MTBF to someone due to some misunderstanding of what it is or is not. What would you do when every single person in the room raises their hand?
In that situation I ditched my prepared presentation and instead led a discussion on what could go wrong when someone uses MTBF unwittingly and without knowledge of the underlying assumptions. That hour went by very quickly and was probably the best presentation I’ve made at a conference. It is from that experience that I began my campaign to eradicate the use of MTBF from our profession.
A short time later I took the notes gathered during that discussion and wrote the Perils essay outlining the many ways MTBF is often mistaken for being something it is not.
The HL argument
HL begins the refutation of my argument by agreeing that MTBF is often misused. While a great opening statement and one debaters employ to set up a contrary point, HL proceeds to claim that MTBF is what it is, and causes misunderstandings due to the nature of the metric. It is the people that lack knowledge and the proper use of MTBF.
I do blame the metric, and in the Perils article I break down the words and their common meanings along with how that name itself leads to much of the MTBF confusion. I’ll not repeat it here other than to ask you if the mean for a distribution has to be at the 50th percentile? (It doesn’t and isn’t for the exponential distribution)
The mean is just the mean and when commonly understood it is useful. Yet, most professionals in and about our world have enjoyed a class or two in probability & statistics. Many hope to not repeat that experience. Unfortunately what most recall from those statistics classes is rooted in the study of the normal distribution. In my argument, this widespread education provides the widespread understanding that a mean or average is the 50th percentile, which for life distributions used to model life data, is simply not true.
Any metric we use should be obvious, understood, meaningful and accurate enough for the task or decision. MTBF fails on all counts, over and over.
HL also postulates that we misuse the mean value for other summaries of data, like the 400 richest people on the Forbes list, then use it for a mortgage calculation, thus we should protest the use of mean values by Forbes. He also suggests that the average is just a use of statistics and since many do not understand statistics, we should thus campaign to eliminate statistics (I just heard a few rallying cheers for that idea). He also suggests that much of our science is based on inaccurate measures, therefore science is flawed and should be stopped.
What a Luddite.
In the measure of the central tendency of a dataset we are taught three measures. Mean, median and mode. Mode is not commonly used, as it is often not meaningful. It is still there, though, and we can calculate the mode for a dataset, although it rarely will be useful when faced with a decision based on the data.
Statistics is a way to discuss, measure, and understand the variation that occurs. It is a method to permit us to sample a population and make informed decisions. Yet, with just a little understanding of basic statistics and the occasional assistance of a professional statistician (or reliability engineer, since we should have the working professional knowledge of reliability statistics) we enable better use of data and decisions.
In science we continue to learn and ask questions. We use measures for describing various dimensions. Reliability or the probability of survival over a period of time is one of these measures. MTBF is not the only way to describe this measure. MTBF is one of the more information-poor, insufficient, inaccurate, and misunderstood measures for reliability that exists. In the science of reliability engineering related to how to describe reliability we have many other measures that are easy to use, easy to calculate, and that are also more accurate and informative than MTBF. The Weibull distribution has only been around for about 60 years and trivial to calculate for 30 years. Let’s all stop using MTBF because it is easy, simple, or common, and avoid the mistakes, errors, and loss due to not communicating clearly.
Where you in that conference room when I asked about your experience with MTBF? What are your issues with MTBF and how it is used? What have you experienced as the worst misunderstanding of MTBF?