Maintenance Statistics without MTBF
How does your equipment fail? How do you plan for spares? Do you use your existing failure data to help refine your maintenance planning?
Given the title of the article, these questions are reasonable. As either a plant reliability or maintenance engineer do you also rely on gut feel to refine your estimates? If you rely on MTBF or similar metrics, you most likely do not trust the data to provide useful answers.
It’s expensive to run out of spares or carry inventory. It’s also expensive to incur unplanned downtime.
Data and MTBF
Facilities and major systems have failures. From minor and easy to remedy issues to complex and expensive failures. Tracking the downtime and source of the downtime or failure has become routine and in many cases automated. We have databases full of time to failure information.
In too many cases this rich and informative time to failure data is converted to an average. Tracking downing events per hour or shift, becomes the average time between failures. Defects per 1,000 units become a measure of average production between failures. We count failures. Then divide time or volume by the count to get an average.
We track the average from day to day, shift to shift,. Or, we use the averages to compare product lines or equipment brand. The measuring and tracking already occurs and primarily not for use by the maintenance team. Uptime and throughput are important business measures, and reflect on maintenance practices.
The MTBF values provide measure of failure rate. Generally the average is over some relatively short period, a shift or production run. Rarely do we track the rate of change of failure rates except antidotally. Yet, know if a system is improve or not with various maintenance actions is important. MTBF will not provide that information.
System Mean Cumulative Function and Plotting
One way to get a picture of the range of change of failures in a repairable system is with a simple plot of the data. The mean cumulative function plot shows the time (cycles, run, etc) and the cumulating number of failures (count of failures)
See Wayne’s paper in the article Graphical Analysis of Repair Data for details.
In my experience, just plotting the data for a piece of equipment or fleet of similar items provides significant insight. The plots allow quick identification of systems operating with more of fewer failures over time. The plot show if a system’s repairs are effective or not. And the plots are understandable by managers.
Weibull Plotting for non-repairable items.
Some items in a system are not repairable or are consumed. The failure mechanisms may appear as decreasing or increasing failure rates (rarely are individual items failing in a truly random fashion). So, again a plot of the time to fail data provides insights to the changing nature of failure arrivals.
In general I fit the time to failure data for non-repairable items to a Weibull distribution. See the article Censored Data and CDF Plotting Points for a brief discussion on censored data and how to create the cumulative density function plot for a Weibull distribution.
Again the plot provides a means to detect adverse changes in the failure rate, compare stress or use rates and time to failure behavior, and if the repair policy is effective in minimizing downing events.
By taking into account the changing nature of time to failure rates for your system, you gain the information you need to significantly improve your maintenance program.