I watched the webcast of the Boeing presentation on their comprehensive solution to the 787 battery system. People may have different opinions on Boeing’s presentation, but I was particularly impressed with the presentation of Mike Sinnett, Vice President and Chief Project Engineer. He articulated an approach to an engineering issue that is the same as the method used by effective pilots. Their approach has been validated by the FAA, the only people outside Boeing with all the available information. The NTSB is continuing their investigation.
The 787 Dreamliner battery saga is providing many lessons for the air transportation and aerospace engineering industries. I am not an engineer by training, but I have tremendous respect for the profession. The ability to apply science and manufacturing to an idea and produce a sophisticated product is very impressive. Their goal is to produce a product that reliably performs to the design criteria. In the case of the Dreamliner, as well as their other aircraft, the primary design criteria espoused by Boeing is safety.
My education and my training is in the operation of sophisticated machinery, specifically airplanes, in a manner that accomplishes the desired objective(s). The primary objective is safety for me as well.
Engineers and pilots have much the same approach to their disciplines. We both have challenges to deal with; risk, causal factors, unexpected outcomes, testing of assumptions as well as unknowns. Engineers, however, are afforded a little more of one critical resource than pilots, time. When a problem occurs, a pilot is always faced with a fixed amount of time to deal with it. The time can be a short as seconds to as long as hours, but there is always a limit. Engineers have some time limits as well. But, for engineers running out of their “fuel”, money, allows them weeks or months instead of minutes and hours.
Pilots and engineers do have the same approach to the obstacles to their desired objective, safety. It is awareness, avoidance and resolution. What are issues that might be anticipated? Can these issues be avoided? If an issue cannot be avoided, or my avoidance strategy is ineffective, will the outcome still be acceptable?
When the engineer is asked if he is certain his part will not fail, the answer is no. When the pilot is asked if he is certain he will be able to land at the scheduled destination, the answer is also no. That is the case every day throughout the air transportation industry. With so much uncertainty, how can air transportation have such a remarkable safety record?
Both the engineer and the pilot have the same mindset. They ask “Can something go wrong?”, “Can it be prevented?” and if prevention is ineffective, “Can it be dealt with to a safe conclusion?”. Successful engineers build redundancy into their systems. Successful pilots build redundancy into their decision making. This multifaceted and balanced approach is essential in the dynamic environment that is aviation.
Failures, mechanical and human, are a part of air transportation. They are neither desired nor acceptable, but are a reality. The existence of failure or error is part of any mechanical or human system. An engineer that believes he can build a part that will never fail is as naive as the pilot that believes he can avoid all errors. The resolution process of failures or errors indicates a great deal about the effectiveness of an engineer or pilot. Is it recognized in a timely manner? Is it trapped before there are negative consequences? Are the circumstances of the error or failure fully considered for possible future countermeasures? Are opportunities explored for additional levels of redundancy?
I am sure the aerospace engineering community has a name for their process to achieve a safe aircraft. For pilots, our process is know as Threat and Error Management. Identifying and managing (avoiding) threats and errors before they result in a negative impact to safe operations is how we both do it.
The backside of the clock refers to the part of the day when most people in the time zone where you live are asleep. The mission of the 777/787 allows pilots to experience many hours of darkness. The “long haul” routine provides plenty of opportunity to contemplate the “what if ?”. The modern airliner needs the pilots’ brain more than it needs their hands. So that’s what we’re going to look at here, the pilot’s magnificent, yet fallible brain. Thanks for visiting and your comments.
Saturday, March 16, 2013
Sunday, March 3, 2013
What is Acceptable Risk?
"There are no new types of air crashes - only people with short
memories. Every accident has it's own forerunners, and every one happens
either because somebody did not know where to draw the vital dividing line
between the unforeseen and the unforeseeable or because well-meaning people
deemed the risk acceptable." "The Final Call" (Why Airline Disasters Continue to Happen)
I have referenced this 1990 Stephan Barlay quote before and it has
never been more applicable than it is today.
Not just because of 787 battery problems, but because it is the essence
of safety, whether aviation related or not.
Recently Boeing met with the FAA to give its proposal for
addressing the battery issues associated with the 787 Dreamliner. The terms unforeseen and unforeseeable may
not have been used in those meeting, but I am confident those concepts were
discussed at length. Trying to sort out
what we know, what we don’t know and what we don’t know we don’t know can be a
daunting task. However, all the risks
associated with what we know and what we don’t know become academic if we don’t
fly. That, of course, was the rationale
the FAA used to issue an order grounding the 787. So now like every other airliner, if the 787
ever flies again, there will be actual risks on every flight. I don’t believe any manufacturer or operator
has ever intentionally placed an airplane and its crew, passengers and cargo in
a situation where they determined the risk to be unacceptable. The question
that must be asked is simple yet complex.
It is the question that all Captains must always answer in their
decision making process. Under what circumstances are the risks
acceptable or unacceptable?
The idea of an acceptable risk is confusing. I have jokingly referred to it as an
“aviation oxymoron”. Acceptable risk is
one of aviation’s least understood concepts.
Most pilots have a very difficult time articulating exactly what it
is. It is not difficult for most pilots to
point out things that are clearly risky.
Conditions that portend, “bad things could happen” are usually
considered risky. For example, a worn
out tire, severe icing, microburst wind shear, inadequate fuel supply are all
examples of things that would commonly be termed “risky”. In fact everyone would probably agree these
things would be not just risky, but unacceptably risky.
Conversely, what about things like engine failure on
takeoff, contaminated runways with diminished braking action, inoperative
communications or navigation equipment?
These are all considered risky scenarios, but we fly safely with them
every day. What is the difference? These
risks have been deemed acceptable by the FAA, by the manufacturer, by the
operator and by the pilots otherwise they would not be routine. If some risks in aviation were not acceptable
we would never fly.
Therefore, what is an acceptable risk? Is an acceptable risk just subjective opinion
or are there ways we can more objectively determine acceptability. I believe there is a process for evaluating
whether a risk is acceptable or unacceptable.
To begin this discussion we must first talk about risk. The risky or threatening event has two
components, probability and severity.
What is the chance the event will occur?
What is the severity of the outcome if the event does occur? Is the risk acceptable if the probability is
low (not zero)? If we use probability to
determine acceptable risk then we are making safety a game of chance. Do you fell lucky? If we use severity to determine acceptable
risk then anything like engine failure during takeoff is a deal breaker. In that scenario, only when we develop an
engine that cannot fail will we fly.
That is not realistic either. How
can we effectively address the probability / severity issue? The answer is rather simple.
Let’s take the probability issue first. Since most people, especially airline customers
are not comfortable with luck as a safety management system we must assume the
probability of the event is 1.0. Just
simply saying an engine only fails once in 10 to the 15th flights is
not enough. With those odds, the same
person who buys a power ball ticket will never get on an airplane. In practice, however, the probability must be
very small but does not need to be zero.
If we knew there was a high probability of certain numbers being picked
the lotto would go broke. Similarly, if
we knew there was a high probability of an engine failing on takeoff, the
manufacturer and airline would go broke.
Who would fly on that airplane or airline? The probability question is actually not if,
but when.
That brings us to severity.
This is where, I’m sorry, the rubber meets the road.
This is really where risk becomes acceptable or
unacceptable. With every risk there is a
corresponding set of associated threats.
Using the engine failure on takeoff example, some of the threats would
be; climb performance, stopping distance, controllability, pilot training,
terrain to name a few. The severity
question becomes a function of the ability to manage the threats associated
with an engine failure (the risk). If all the associated threats can be
effectively managed to a safe outcome, the risk becomes acceptable. It is the difference between, “We hope this
doesn’t happen.” And, “If this happens we will be safe”. The FAA has a statement in pilot
certification standards that is applicable here, “The outcome of the maneuver
must never be in doubt.” If there is
doubt in the outcome, the risk is unacceptable.
This is why the safety management strategy known as Threat
and Error Management has brought about a new paradigm in risk assessment or decision-making. Risk assessment is no longer a just a graph
where one axis is probability and the other severity with some green areas under
the curve labeled low (acceptable?). With
probability and severity as risk criteria, only the origin (0,0) would be truly
acceptable. In an effective safety management system the ability to mitigate
threats defines the “acceptable risk” area, not probability/severity. Threat management creates a clear distinction
between acceptable and unacceptable risk in contrast to the spectrum defined by
probability and severity. The good news
is that there is a rather large area under the threshold that is acceptable. The
threat management definition of risk assessment allows for a safe operation in
a very dynamic environment. Another difference between those two risk
assessment models is Threat and Error Management is a cultural mindset and a skill,
not fixed value like probability or severity. Threat and Error Management can be trained and
imbedded in an operation whereas probability / severity are fixed parameters.
What is an acceptable risk?
It is a risk where the set of associated
threats can be effectively managed to a safe outcome. A threat management strategy is simply a
process of identifying and preparing for things that potentially make the
environment more complex or reduce safety margins. Therefore, an unacceptable risk is where the set
of associated threats cannot be effectively managed. Why is microburst windshear an unacceptable risk? We cannot guarantee that the associated
threats can be effectively managed if we fly through it. Therefore, we avoid it. But what if we get into windshear inadvertently? We must have a strategy for that as well.
Finally, the ubiquitous potential
for human error must be embraced and always considered a threat. Error management is inexorably linked to
threat management and its strategic principles. Threat management strategies in
aviation are developed around two basic components, the system (hardware) and
the individual (human). The system
component includes those inanimate factors (e.g. checklists, ground proximity
warning systems) engineered into the operation that exist whether or not the
crew is present or chooses to use them.
The individual component includes the effective human behaviors that are
derived from training and cognitive thought (e.g. communication, situation
awareness, decision making). Both
components of threat management, the system and the individual, are tools that
must be understood and effectively employed if a risk is to be deemed
acceptable.
Subscribe to:
Posts (Atom)