Replace After MTTF Time To Avoid Failures – Right?
Received a short question last week. The person writing seems to already know the answer, yet asked:
If we replace an item after a duration equal to the MTTF value, we would avoid failures, right?
Well, no, most likely not, was my response. What is your response? How would you answer this question?
My First Response
MTTF is the total time divided by the number of failures. It really doesn’t matter what underlying pattern of failures occurs, or which distribution may properly describe the pattern of failures. Given the question, we do not know the distribution or pattern in the time to the failures.
The use of MTTF suggests we are talking about non-repairable items or only considering the time to first failures. That doesn’t really help much.
Given only MTTF and if the failures rate decreases over time or is constant over time, then replacing the units would at best keep the failure rate the same, or make the chance of failure worse.
If the items exhibit a wear out pattern, then replacing an item at any point in time would generally decrease the chance of failure.
In any of these circumstances, we would expect the item to fail on or before the duration that is the same value as the MTTF value. If a constant failure rate applies we would expect about 2/3 of items in service to have failed before operating the MTTF duration.
Keep in mind that MTTF represents the inverse of the failure rate per unit time. So a 1,200 hour MTTF value means there is a 1 in 1,200 chance of failure each and every hour – if and only if the actually underlying distribution that described the failure pattern is exponential. A constant hazard rate means each and every hour has the same chance of failure as any other hour. This doesn’t occur in nature, thus a rather poor assumption.
My Suggestion to Plan for Maintenance
If using a non-repairable item, and you would like to minimize unscheduled failures, using MTTF is not going to help.
Instead, we need to understand if the items are exhibiting a decreasing or increasing failure rate over time, or a mix of the two. Get the data and plot the time to failure data.
If a decreasing failure rate over time, replace the items when they fail, as any preventative replacement will increase the chance of failure. Better would be to improve the ability of the item to work upon installation thus minimizing the initial chance of failures.
If the item has an increasing failure rate over time, replace the item when the chance of failure increases to an unacceptable level. Generally, consider the cost of the replacement along with the cost of unscheduled downtime to determine the optimal time for replacement.
It could be a mix of failure mechanisms, that show both decreasing and increasing failure rates over time, thus apply both approaches mentioned. Get the data. Do not assume a constant hazard rate.
What is your approach? How would you answer the posed question? Use the comment field to add you thoughts.