By Massimo Bombino
When a second Boeing 737 MAX aircraft crashed in Ethiopia in March, just a few months after one crashed in Indonesia at the end of 2018, it focused media attention on Boeing’s new aircraft and both it and the FAA’s testing and certification processes.
In case this was not enough, just a day before the fatal Lion Air crash in Indonesia, another incident on the same aircraft was avoided by coincidence when a resting pilot, who was by chance on the same aircraft, ran to the cockpit and helped prevent a tragedy. This incident gives us clues which enable us to better understand both catastrophes.
It is not my habit to comment on incidents that have recently ocurred, because investigations can last months – but this time we already know who the culprit might be. In fact, in each of the 737 Max incidents, it has been highlighted how the pilots fought in vain against systems onboard their own aircraft. They were struggling against a stabilization system called MCAS (Maneuvering Characteristics Augmentation System), potentially dangerous maneuver correction software that paradoxically created the two tragedies.
Is software the culprit?
Maybe in part, but the reality is different from initial impressions. And the lesson learned today, which I aim to reveal in the following analysis, can be useful to you and your company…whatever software you might produce and use.
What happened?
Regardless of the conclusions to come from the full investigation of the Ehtiopian Airlines incident, evidence from the black box analysis and the prevented accident show that the aircraft on both occasions suddenly changed altitude, that emergency maneuvers were taken by pilots. The recordings from the cockpit show that the onboard avionics had decided to direct the aircraft towards the ground, as if it had to evade an imminent danger.
The problem is that the danger did not exist and pointing towards the ground just after takeoff, at a low altitude and low speed, is an extremely dangerous thing to do… and if repeatedly and abruptly made can lead to such an instability and loss of altitude to cause the disasters that actually happened.
But why does this particular aircraft do whatever it wants, pointing its nose towards the ground? And above all else – is it true that the pilots couldn’t do anything to avoid the disaster? There is much debate about this…
Aircraft or bus?
Our analysis begins with a famous dilemma among aviation insiders and aircraft aficionados: Boeing or Airbus? At the center of the debate is one object: a joystick.
For many years, Airbus aircraft have not been piloted with a two handed yoke, but with a joystick using Fly-by-Wire. Using this method, the aircraft is controlled by electronic systems and no longer just mechanical-hydraulics ones. The move was a large advance in both the feel of the aircraft controls and avionics in general and has its critics and supporters.
In brief, on an Airbus aircraft its onboard avionics have more control over the flight envelope to ensure the aircraft is always being safely flown. The flight envelope is the zone around the aircraft which it must always be in to be flying safely. It is an array of complex data about speed, inclination and acceleration with limits never to be exceeded so as not to endanger the aircraft and its crew and passengers.
If the pilot makes, even unintentionally, any dangerous actions or maneuvers, it intervenes, unless some functionalities are voluntarily turned off. Effectively the aircraft is almost always self-driving, choosing the best maneuver in every situation.
According to critics, this level of autonomy can make Airbus pilots lazy and forget how to really manually fly an aircraft, to the extent they are called “bus drivers”, a derogatory reference to the Airbus company.
On the other hand, Boeing has stuck with the more traditional control yoke and pilots of its aircraft have the accompanying tactile and visual feel of what they are doing, together with their co-pilots, because the yokes move together.
Even if some electronic assistance systems have been introduced on these aircraft, they can be instantaneously turned off by just pushing the controls slightly more firmly than usual to regain total control on the aircraft. For this reason, many consider Boeing pilots to be more skilled in manual flying and “feeling” an aircraft for its behavior, since they practice much more often, rather than let the autopilot do the flying.
At a first glance, one could say Airbus aircraft are safer because they are automatic, while Boeing aircraft on the other hand require more skilled pilots.
However, the exact opposite has just happened with the 737 Max incidents. The Boeing aircraft crashed because of their automated systems. The opposite of what one would have expected.
What is MCAS and why it’s not the only culprit
The Boeing 737 MAX, as the name suggests, is the larger version of the 737, a commercial blockbuster of extreme reliability, outfitted with more powerful engines.
The design challenge for Boeing’s engineers with the MAX was that the aircraft’s larger, more powerful engines and their position on the aircraft were known to potentially create hazardous situations – exits from the flight envelope. In particular it was shown that there were risks at low-speed, high-angle flight conditions, which might lead to the aircraft stalling.
To mitigate these risks, Boeing introduced the MCAS electronic anti-stalling system, which is very similar to a system already installed on Airbus aircraft. The intent was to enable the new MAX version of the 737 to be flown without danger by pilots who were already trained in the old 737 NG (next generation).
The premise is sound – a device to make the new aircraft safer… somewhat following Airbus in terms of flight system automation, which reduces the pilots’ workload.
So, did the software fail? No, the sensors did.
Don’t blame the sensors
In order to function, this MCAS requires, among other data, Angle of Attack (AoA) – the aircraft’s inclination, which is supplied by sensors. Early investigations into the recent incidents indicate that these sensors malfunctioned, sending incorrect data to the MCAS, causing the autopilot to malfunction.
Investigators believe that the faulty sensors signaled that the aircraft had dangerously reared.. Unfortunately, this wasn’t true, the aircraft was flying in a straight line… and the MCAS made the aircraft seek to lose altitude, despite repeated pilot interventions to raise it up.
Can you imagine such a nightmare? You are piloting a passenger aircraft which has seemingly gone mad and wants to crash land… you try to turn it straight for dozens of times, but everything is useless, it inescapably points down towards death.
So were the crashes the sensors’ fault? No, there is a root cause reason for the crash… and once again, it is not the ultimate one.
A question of money
The MCAS had an internal diagnostic system for all of its components, including the AoA sensors. One this system signaled a problem, the pilots should have known to disabled MCAS and pilot the aircraft manually, as they had been trained to.
Unfortunately, such the diagnostic system, the so-called “disagree alert” was optional and expensive, so some “poorer” airlines did not not buy it. This has been disputed by Boeing, which has said although the diagnosis system was present, the alert indicator was optional and was not.
Both the Indonesian and the Ethiopian aircraft 737 Max fleets were not outfitted with the alert system, so the pilots had found themselves at the mercy of an aircraft doing on its own will, creating an imaginary hazard situation that just wasn’t there. All for a question of money.
One could certainly conclude that in such a delicate and potentially life-saving system, such an auto-diagnosis system should be integrated by default, to check everything works correctly, from sensors to actuators, from computers to displays.
Therefore, the manufacturer can be considered to be at fault… the automatic flight system should be safe for every situation and aircraft. But let’s recap before deciding who the actual culprit of such tragic incidents is.
The chain of events
So, the chain of events is clear:
- The 737 MAX aircraft is larger and carries faster engines than the previous models
- Being more powerful, it might create situations in which the head rearing could provoke situations of serious flight instability, stalling
- To avoid this issue and save stress and additional training for pilots, an anti-stall system called MCAS was introduced
- The AoA sensors in the Lion Air and Ethiopian Airlines flight crashes were defective and, sending made up information, caused a needless MCAS intervention, which ultimately made the aircraft crash, despite the desperate corrections by the pilots
- A sensor diagnostic system, which would have alerted pilots to turn the MCAS off, was available, but the Indonesian and Ethiopian airlines had not purchased it because too expensive
However, there are still some clues to discover the final culprit. In fact, attentive readers may have already determined the key question:
Why didn’t the pilots turn the MCAS off?
This is the central question.
The option to deactivate the system existed, but on both tragic flights the pilot didn’t use it. On the other hand, the Lion Air flight which narrowly escaped a crash the previous day to Ethiopian Airline crash, a resting pilot among the passengers had understood what was happening and rushed into the cockpit and deactivated the MCAS, saving the aircraft.
A hero? No – just an informed and trained pilot.
In fact, the pilots of both crashed aircrafts, as well as many other Boeing pilots around the world, faced one or more of these essential training problems:
- They were unaware MCAS had been introduced
- They did not understand how MCAS worked and when it operated
- They didn’t know how to recognize the MCAS’s anomalous reaction
- They didn’t know how to switch the MCAS off
You may say that it is crazy and foolish that pushing a button was all was needed to deactivate the deadly MCAS software. But the very last question is: why were the pilots not trained?
The culprit: pressure to reduce time to market
It is yet to have its final confirmation, but a growing number of people in the industry are beginning to believe that the missing or inadequate training can be blamed on a strategy that prioritized bringing the 737 Max to market as quickly as possible
In fact, reports reveal that training for the 737 Max has been very limited, in some cases less than a single hour on an iPad.
If it weren’t reported as fact, I would find this unbelievable.
The optional diagnostics system which was not specified, the compulsion to save money and time, to squeeze the infamous time to market, can be seen as the actual culprit in these two tragedies.
If you have read other columns of mine, you will know how organizational culture is so vital in aerospace and the lessons which can be applied to your company.
The wrong culture can kill
Just think for a moment: how many times in your company or team, for budget and/or time issues, have you:
- introduced changes without alerting everybody?
- purchased a tool without buying the specific classes?
- changed procedures without informing and training all the people involved?
- taken measures to improve some processes, without involving everybody who was affected?
Well, every one of these times, you increased the risks of:
- Harming your customers, if your software is Safety-Critical
- Making your business and your clients stop and losing money, if your system is Business-Critical
It is very important when you develop devices containing critical software, to adopt an approach, a method that is safe, reliable and at the same time modern and efficient, where software testing is a key part of the strategy, but it should be done as a part of an overall strategy. This is my specialty since over 25 years, I wrote a book about it (request a copy here) and will write more about this topic on my blog: Safer Software.
Massimo Bombino has more than 25 years experience in safety-critical software development. He has taken part in several DO-178B-C Avionics Certifications and Gap Analysis, and has also run several trainingcourses related to critical software development. He is the author of the book “Safer Software”, a fundamental guide to better understand the principles of software quality and safety for critical markets. In 2018, he founded “Safer Software”, a company offering high-profile consultancy and training to customers facing Software Certification and Quality issues