REVIEW: Analyzing Repairable System Failures Data
Recently, Ziad let me know he published an article titled Analyzing Repairable System Failures Data in the April-May 2017 issue of Uptime magazine (subscription required). He suggested I’d be interested in the article since it provides a way to analyze repairable system data without using MTBF. He was right.
The article is a short description and tutorial on using mean cumulative plotting and function (MCF). While the article recommends staying away from using MTBF, it could be a bit of a stronger message. The article does provide a very nice worked out example illustrating the use of a mean cumulative plot.
Breakdown of Failure Data Analysis Approaches
Figure 1 in the article shows a range of different analysis approaches for non-repairable and repairable items. It illustrates the range of tools suitable for the type of data you have under examination. The non-parametric approaches do not make as many assumptions, and avoid trying to fit a specific distribution to the data. This tends to provide results which reflect the actual data without distortion or tends to be a bit conservative.
I first learned about MCF from David Trindade and the basic plotting interpretation discussion only took a few minutes. It really is pretty straight forward. The book by Tobias and Trindade includes comprehensive explanation of MCF. Tobias, Paul A, and David C Trindade. Applied Reliability. Boca Raton, FL: CRC/Taylor & Francis, 2012.
Wayne Nelson has also written about MCF extensively both in an article provided to the NoMTBF site, Graphical Analysis of Repair Data, and his recent book. Recurrent Events Data Analysis for Product Repairs, Disease Recurrences, and Other Applications (Asa-Siam Series on Statistics and Applied probability where he discusses recurrent event data, which is what data from a repairable system is from a statistician’s view.
For repairable system parametric analysis, there has been work in recent years to deal with imperfect renewal processes. The team at Reliasoft published, Guo, Huairui R, Liao Haito, Zhao Wenbiao, and Adamantios Mettas. “A New Stochastic Model for Systems Under General Repairs.” IEEE Transactions on Reliability 56, no. 1 (2007): 40-49. The paper discusses a way to account a repair that restores a system some fraction, rather than assuming ‘good of old’ or ‘good as new’ conditions after a repair.
Dealing with Distribution Assumptions
Ziad right at the start of the article discusses the downside of parametric modeling and the common (rampant) disregard of the underlying assumptions. With any model, if the underlying assumptions are not valid, the results form the model are not valid.
The most common approach for repairable data analysis is to simply calculate the MTBF. Simple. Yet in the vast majority of situations the results are less than helpful when trying to understand the reliability performance of your system. Furthermore, many simply assume a constant failure rate, which rarely is true.
I agree with Ziad, part of the analysis is to check assumptions – do it and you too will find the assumed constant failure rate is the source of the your poor decisions based on MTBF based analysis. MTBF or the exponential distribution are just not able to reflect the changing nature of failure rates over time.
Later in the article Ziad mentions ‘When the recurrence rate is a constant…’ the use of MTBF is ok. I disagree with this admission. While technically true, MTBF has so many issues with it’s use and understanding that even when statistical valid, is to be avoided.
Check your assumptions. Always.
Multiple Ways to Use MCF Plotting
The article wraps up with an example and discussion of an assortment of ways to use the MCF approach. I especially like the connecting costs of failures to the analysis as a means to prioritize improvement work.
Ziad lists a number of other ways to use MCF in the paragraph titled, MCF Extensions. I may have to explore with some of those suggestions. Very clever.
Check it out and shift your analysis of repairable data to MCF. Ziad’s article is a nice primer on just how easy and informative such an analysis can be for you and your team.
One more time, here’s a link to Ziad’s article which is within the Uptime magazine. (subscription required)