Reliability and Availability
In English there is a lot of confusion on what reliability, availability and other ‘ilities mean in a technical way. Reliability as used in advertising and common discussions often means dependable or trustworthy. If talking about a product or system it may mean it will work as expected.
Availability is less common, yet available implies that a person or system is present and ready to engage or start. When picking up a friend at their home, if they are prepared and ready to depart when I arrive, than we can say they were available. This is closer to our (reliability professionals) technical meaning.
We have a couple of issues to overcome when talking about reliability and availability.
First we need to be very clear when we talk about reliability and availability. For reliability we mean: A system will function as expected in the described environment with a specific probability of successful performance over a duration. The common meaning is closer to availability — it will work when we need it. The car will start.
When talking about reliability, be clear. Either state or imply the system, function and environment (easy when everyone knows we’re talking about a specific product with well documented functions and environmental conditions). Second always state reliability as a couplet of a probability of success and duration. For example, 98% chance of survival over 2 years.
When there is any doubt then specific the function, environment, and use conditions, too. For example, a fan will provide xx airflow for a computer chassis in a US home environment 24 hours a day for 2 years with 98% chance of successful operation. We could further define the use conditions to include RPM, average speeds, speed profiles, environmental contains such as temperature, humidity, dust, vibration, etc. The functional specifications may include back pressure, air inlet size and filters, etc.
Keep in mind that many understand reliability to mean it will work — and little more. Some believe a 2 year product implies a normal distribution with a mean value at 2 years.
Availability has fewer misunderstandings, yet is not always clear how to measure availability. What the customer or operator actually wants is the equipment is ready to function when expected to operate. For example, when I walk out to my car, I expect it to start, whenever I walk out to my car. If I do not need to drive anywhere today, I’m not testing if the car is or was available, yet I was expecting it to be ready. As opposed to knowing my car in the shop for maintenance, thus not ready to go.
Relationship between reliability and availability
As you know, a system that doesn’t fail is reliable, it also is available. If a system doesn’t have any downtime for a year, we can say in hindsight it had 100% reliability and availability over that year. There may exist a chance of failure for each moment of operation, thus the expected reliability may have been 99% for the year, and this past year without failures, we didn’t incur failure which should have been rare.
Now a system that regularly fails, say every month may still have a very close to 100% availability over a year if the time to restore (repair) the system to operation is very quick. Let’s say it only takes a few seconds to restore the system, the even if the reliability is very low over the year, the availability remains very high.
In practice this isn’t always the case, quick repair that is, therefore maximizing reliability and minimizing maintenance time is what we often work to achieve. One way to measure availability is to tally the uptime of a system and divide by the sum of all uptime and downtime over a specific duration. So if a system is to operate 24/7 for a year (8,760 hours) and incurred 10 hours of downtime, the availability is 8750 / (8750 + 10) = 0.9989 or about 99.9% available.
When talking about reliability and availability be clear, define the terms and remain consistent. Note I’m not diving into all the added unnecessary confusion of sorting and limiting what is counted as a failure (many do not include no trouble found failures) or the many different element so downtime (diagnostic, wrench, logistics times). Keep it simple, count everything the customer considers a failure, and be very clear. A complex algorithm for reliability or availability rarely sheds any information on what is happened, thus keep the metrics simple and complete.