Sunday, March 3, 2013

What is Acceptable Risk?

"There are no new types of air crashes - only people with short memories.  Every accident has it's own forerunners, and every one happens either because somebody did not know where to draw the vital dividing line between the unforeseen and the unforeseeable or because well-meaning people deemed the risk acceptable." "The Final Call" (Why Airline Disasters Continue to Happen)

I have referenced this 1990 Stephan Barlay quote before and it has never been more applicable than it is today.  Not just because of 787 battery problems, but because it is the essence of safety, whether aviation related or not.

Recently Boeing met with the FAA to give its proposal for addressing the battery issues associated with the 787 Dreamliner.  The terms unforeseen and unforeseeable may not have been used in those meeting, but I am confident those concepts were discussed at length.  Trying to sort out what we know, what we don’t know and what we don’t know we don’t know can be a daunting task.  However, all the risks associated with what we know and what we don’t know become academic if we don’t fly.  That, of course, was the rationale the FAA used to issue an order grounding the 787.  So now like every other airliner, if the 787 ever flies again, there will be actual risks on every flight.  I don’t believe any manufacturer or operator has ever intentionally placed an airplane and its crew, passengers and cargo in a situation where they determined the risk to be unacceptable. The question that must be asked is simple yet complex.  It is the question that all Captains must always answer in their decision making process.   Under what circumstances are the risks acceptable or unacceptable?

The idea of an acceptable risk is confusing.  I have jokingly referred to it as an “aviation oxymoron”.  Acceptable risk is one of aviation’s least understood concepts.  Most pilots have a very difficult time articulating exactly what it is.  It is not difficult for most pilots to point out things that are clearly risky.  Conditions that portend, “bad things could happen” are usually considered risky.  For example, a worn out tire, severe icing, microburst wind shear, inadequate fuel supply are all examples of things that would commonly be termed “risky”.  In fact everyone would probably agree these things would be not just risky, but unacceptably risky.

Conversely, what about things like engine failure on takeoff, contaminated runways with diminished braking action, inoperative communications or navigation equipment?  These are all considered risky scenarios, but we fly safely with them every day.  What is the difference? These risks have been deemed acceptable by the FAA, by the manufacturer, by the operator and by the pilots otherwise they would not be routine.  If some risks in aviation were not acceptable we would never fly.

Therefore, what is an acceptable risk?  Is an acceptable risk just subjective opinion or are there ways we can more objectively determine acceptability.  I believe there is a process for evaluating whether a risk is acceptable or unacceptable.

To begin this discussion we must first talk about risk.  The risky or threatening event has two components, probability and severity.  What is the chance the event will occur?  What is the severity of the outcome if the event does occur?  Is the risk acceptable if the probability is low (not zero)?  If we use probability to determine acceptable risk then we are making safety a game of chance.  Do you fell lucky?  If we use severity to determine acceptable risk then anything like engine failure during takeoff is a deal breaker.  In that scenario, only when we develop an engine that cannot fail will we fly.  That is not realistic either.  How can we effectively address the probability / severity issue?  The answer is rather simple.

Let’s take the probability issue first.  Since most people, especially airline customers are not comfortable with luck as a safety management system we must assume the probability of the event is 1.0.  Just simply saying an engine only fails once in 10 to the 15th flights is not enough.  With those odds, the same person who buys a power ball ticket will never get on an airplane.  In practice, however, the probability must be very small but does not need to be zero.  If we knew there was a high probability of certain numbers being picked the lotto would go broke.  Similarly, if we knew there was a high probability of an engine failing on takeoff, the manufacturer and airline would go broke.  Who would fly on that airplane or airline?  The probability question is actually not if, but when.

That brings us to severity.  This is where, I’m sorry, the rubber meets the road. 
This is really where risk becomes acceptable or unacceptable.  With every risk there is a corresponding set of associated threats.  Using the engine failure on takeoff example, some of the threats would be; climb performance, stopping distance, controllability, pilot training, terrain to name a few.  The severity question becomes a function of the ability to manage the threats associated with an engine failure (the risk).   If all the associated threats can be effectively managed to a safe outcome, the risk becomes acceptable.  It is the difference between, “We hope this doesn’t happen.” And, “If this happens we will be safe”.  The FAA has a statement in pilot certification standards that is applicable here, “The outcome of the maneuver must never be in doubt.”  If there is doubt in the outcome, the risk is unacceptable. 

This is why the safety management strategy known as Threat and Error Management has brought about a new paradigm in risk assessment or decision-making.  Risk assessment is no longer a just a graph where one axis is probability and the other severity with some green areas under the curve labeled low (acceptable?).  With probability and severity as risk criteria, only the origin (0,0) would be truly acceptable. In an effective safety management system the ability to mitigate threats defines the “acceptable risk” area, not probability/severity.  Threat management creates a clear distinction between acceptable and unacceptable risk in contrast to the spectrum defined by probability and severity.  The good news is that there is a rather large area under the threshold that is acceptable. The threat management definition of risk assessment allows for a safe operation in a very dynamic environment. Another difference between those two risk assessment models is Threat and Error Management is a cultural mindset and a skill, not fixed value like probability or severity.  Threat and Error Management can be trained and imbedded in an operation whereas probability / severity are fixed parameters.

What is an acceptable risk?   It is a risk where the set of associated threats can be effectively managed to a safe outcome.  A threat management strategy is simply a process of identifying and preparing for things that potentially make the environment more complex or reduce safety margins.   Therefore, an unacceptable risk is where the set of associated threats cannot be effectively managed.  Why is microburst windshear an unacceptable risk?  We cannot guarantee that the associated threats can be effectively managed if we fly through it.  Therefore, we avoid it.  But what if we get into windshear inadvertently?  We must have a strategy for that as well.

Finally, the ubiquitous potential for human error must be embraced and always considered a threat.  Error management is inexorably linked to threat management and its strategic principles. Threat management strategies in aviation are developed around two basic components, the system (hardware) and the individual (human).  The system component includes those inanimate factors (e.g. checklists, ground proximity warning systems) engineered into the operation that exist whether or not the crew is present or chooses to use them.  The individual component includes the effective human behaviors that are derived from training and cognitive thought (e.g. communication, situation awareness, decision making).  Both components of threat management, the system and the individual, are tools that must be understood and effectively employed if a risk is to be deemed acceptable.


No comments:

Post a Comment