Fault Tree Analysis
This post provides an initial overview of the topic of Fault Tree Analysis (FTA). The technique is described in detail in the book Process Risk and Reliability Management.
Risk Analysis
Risk can be analyzed in one of two basic ways: inductively or deductively, that is either bottom-up or top-down.
In a deductive analysis a top-level system failure is postulated. The analyst then works backwards to deduce what combinations of events could have occurred for the system failure to have taken place (a detective solving a crime is thinking deductively). Fault tree analysis, the topic discussed in this article, is deductive.
An inductive analysis works in the other direction. A single failure, such as a pump stopping or a valve closing at the wrong time, is postulated. The inductive approach then determines what impact the item failure could have on the overall system performance. Event tree analysis is inductive. Both techniques provide a clear and intelligible way of determining the combination of events needed for an undesirable incident to occur. Their strict logic cuts through the “I think / You think” discussions that are the bane of so much risk analysis.
Fault Tree Analysis
First used by the Bell Telephone Laboratories and the Boeing Corporation in the years 1962-64 to analyze potential problems with the Minuteman missile launch control system, Fault Tree Analysis (FTA) provides a clear and intelligible way of determining the combination of events needed for an undesirable incident to occur (Vesely 1981). In particular, the graphical nature of the analysis can help managers, engineers and operators better understand how their systems can fail. It is often found that the rigor and logic of a fault tree analysis stimulates creative thinking, and it allows experts to add their experience and opinions in a structured manner. So the fault tree method — in spite of the fact that it is based on logic and Boolean algebra — can help identify new hazards and previously un-thought of failure mechanisms.
Once a fault tree has been developed, failure rate data for individual components in the system can be entered into the tree so that an estimate of the likelihood of the undesired event (the ‘Top Event’) can be made. Frequently the quality of the failure rate data is poor; nevertheless, through use of the Pareto Principle or 80/20 rule discussed above, a quantified analysis still provides useful insights because it identifies which items in the system contribute the most to system failure. Moreover, once the model has been developed, and preliminary estimates as to failure rates have been made, case studies that examine changes to the process and the effects of additional safeguards can be carried out. Also, as improved data for equipment failure rates and repair times becomes available, the quality of the analysis will improve.
A full Fault Tree Analysis is usually time-consuming and requires the services of costly outside experts. However, Qualitative Fault Tree Analysis can provide many insights without requiring nearly as much effort.