Introduction
This is a review and commentary on the independent report commissioned by the FAA following two 737 Max hull loss accidents.
The report’s title is “Section 103 Organisation Designation Authorizations (ODA) for Transport Airplanes Expert Panel Review. – Final Report”
Boeing’s status as an ODA means it is authorised by the FAA to regulate itself. For this purpose there are some 1000 employees called ODA Unit Members (UM) designated to perform this function.
The FAA’s objective for the Expert Panel was to “review the safety management processes and their effectiveness.”
The Expert Panel “focused its review on safety culture, safety management systems (SMS), and ODA, while also evaluating other topics of concern for the safety of the flying public.”
Context – civil aviation
The briefest useful overview of the 50 page report is possible when woven into an understanding of context.
From the beginning of the jet age (in the USA at least) in 1958 safety in the industry was managed in what Rowe (1977 An Anatomy of Risk, John Wiley & Sons, New York) called a systemic way. Evidence for effective control over risk was a steadily decreasing number of deaths pa and per air mile and a reducing fluctuations about the trend line. Rowe demonstrated that this arose because of the systemic manner in which all interested parties (manufacturers, operators, educators, etc.) worked cooperatively and consciously to achieve and maintain the same result.
I worked in the aircraft design and manufacturing industry in the late 1960s and can vouch that in those days there was no separate ‘thing’ called a safety management system. Nevertheless, there was a very strong naturally-grown all-pervading culture around product safety. One could summarise it as “this is very serious, it needs to be designed, constructed and operated with great care and attention to detail. If not people will stop using the product.” It was not always so. This pervasive and successful culture had developed over decades.
I have a clear memory of a new aircraft design which was incapable of recovery from an aerodynamic stall because of the configuration of the design. Technology (the stick shaker and pusher) had been developed to stop a stall developing. The regulator, however, stubbornly insisted that recovery had to be demonstrated. A complete test crew lost their lives as a result. The company’s chief test pilot had a theory about how recovery from a stall could be made to happen and was prepared to try it out. He succeeded. The Wikipedia entry for John Cunningham (who was the test pilot who survived) will provide some insight into the culture of which I write. This industry is indeed “very serious”.
Another significant concern, often discussed over morning tea, was when an accountant became the CEO, generally I was told after an engineer had retired from the position. The engineer provided the necessary resources to design and build a safe product. The accountant increased profit margins, up until something unwanted happened, when the next one in the seat was an engineer, and so on.
As a private pilot, I can attest to the fact that a strong and distinctive aviation culture exists to this day in Government regulated flying operations at least, this being where my current experience lies. The pilot preparing for flight and the Licensed Aircraft Maintenance Engineer all switch to a part of their brain that is conditioned in a special way by their training (this has even been demonstrated using MRI) when approaching their maintenance or flight tasks.
In stark comparison, a story has been related to me by a pilot with first hand experience of an airline in which safety specialists introduced or imposed their ideas of safe work procedures to or on the flight department. The implementation of new flying rules was disastrous, even leading to the suicide of a senior captain who was made to question whether he actually knew how to fly ‘safely’. There is a very large difference, in my view, between the methods of ‘general safety’ and the needs of ‘operational safety’.
See, for a brief reflection on this point:
The main point to be taken from this is that how you operate cannot be distinguished from how you operate legally, safely, economically, smoothly etc.. There should be no distinction between operating procedures and safe or economic or smooth operating procedures. That would apply to design and maintenance too.
Boeing’s woes
Yet, this is exactly what the (relatively) recent imposition of a Safety Management System has done.
The Expert Panel report makes it clear that Boeing resisted having a SMS imposed on top of what it says (and I can believe) are mature existing methods and then attempted to satisfy what eventually became a ‘must do’ by overlaying the required SMS. It is unsurprising that a finding of the Expert Panel was confusion amongst people on the shop floor and “a disconnect between Boeing’s senior management and other members of the organization on safety culture”. Well done Boeing for standing up to the inevitable for as long as you could.
The Expert Panel decided it was within its brief to look at safety culture and relied for its understanding of this on Prof. James Reason, quoting this from him: “A safety culture is not something that springs up ready-made from a near death experience, rather it emerges gradually from the persistent and successful application of practical and down-to-earth measures. There is nothing mystical about it. Acquiring a safety culture is a process of collective learning, like any other. Nor is it a single entity.” There are five components – “Reporting Culture, Just Culture, Flexible Culture, Learning Culture, and Informed Culture.”
I believe the evidence is very clear that in this industry culture has ‘emerged gradually’ and resulted in proven ‘practical and down-to-earth measures’. I’m unsure of the need to subject it to a conceptual definition of culture nor subdivide it into five components and am not at all surprised that the Expert Panel found some confusion amongst employees as to what each of these means.
The Expert Panel also noted of the SMS “Structure” … “procedures and training are complex and in a constant state of change, creating employee confusion”. “Difficulty of distinguishing between safety metrics at all levels” and “understanding their purpose and outcomes”. My broad and deep experience in industry and in the design of SMS leads me to think this is a common problem and I feel Boeing should not be criticised for it, particularly because they have been required to implement it over the top of an existing established system.
These last two points (culture and SMS) remind me of an apt poem:
The centipede was happy, quite,
until one day, in spite,
the frog said “pray tell me, which leg goes after which?”
and he lay distracted in a ditch, considering how to walk.
The frog, of course, is a stand-in for the SMS and associated culture.
A further point made in the Executive Summary is the not surprising observation that when a company is allowed to self-regulate (check on itself for the regulator) there is an awkward relationship between the checkers (who also report to the FAA) and the other employees. The Panel made ten findings on this subject, more than on any other subject. I suspect, but don’t really know, that this tension is most relevant on the assembly floor. I’m sure it is relevant to the (subsequent to the Panel’s work) recent cabin door plug blow out.
Final significant points made by the Panel include concerns about inadequate involvement of human factors engineers and pilots and declining technical experience and expertise generally.
What was it that actually happened that resulted in the Expert Panel?
The growth of the basic 737 design led to more powerful engines being needed. This results in a larger intake diameter and hence an intake lip closer to the ground. As this proximity was unacceptable the engines were moved forwards to allow them to also be raised. This forward placement resulted in pitch instability, which the Stability and Control engineers compensated for by installing a new automated pitch control system. A critical sensor for such a control is measurement of pitch angle. In the as-built design this was provided by a single ‘angle of attack’ sensor at the nose of the aircraft.
Boeing correctly realised this automated pitch control was a departure from the normal tailplane trim mechanism and chose to minimise awareness of this – evidently on the grounds that if the regulator was aware it may delay sales by requiring new certification. The accountant’s view prevailed. See the report of the House Committee on Transportation and Infrastructure: The Boeing 737 MAX Aircraft: Costs, Consequences, and Lessons from its Design, Development, and Certification, March 2020. There’s also evidence for accountants views prevailing in the documentary Downfall: The Case Against Boeing, made in 2022, possibly still available on Netflix. Perhaps therefore, neither human factors engineers nor pilots would have been made aware of the very different technology involved in the pitch trim system. Customer pilots weren’t.
I think there may be features of the mechanical actuation of the stabiliser (the tail plane that provides pitch forces) that are implicated in the extreme difficulties pilots of the two aircraft had in controlling pitch after the angle of attack sensor was damaged, but my enquiries into this have yielded uncertain results. This does not affect the lessons that can be learned here.
What happened in the two cases was that a (presumed) bird strike damaged the angle of attack sensor, resulting in a dysfunctional automatic pitch control system.
What is crucial here is that both Boeing’s design engineers and FAA engineers with whom they interacted failed to see the problem of relying on a single sensor. See the “House Committee” report cited above.
I like to think that any well-educated engineer would recognise the criticality of the sensor input and the need for more than one sensor, possibly more than one form of technology and self-monitoring of the system with a fall-back means of operation also.
This makes me worry about the lack of basic education of all engineers in the field of risk. A one semester unit on what an international standard on risk management says is not a substitute for some real science. As far as I can tell, that is what commonly happens, in my field of view at least.
I find it disturbing that this central consideration is entirely absent from the report of the Expert Panel under the heading of “other topics”. I feel that Rowe would readily recognise that the FAA’s attention was limited to what he called a Risk Management System – what goes on within Boeing, whereas it should have been on the much, much broader approach of Systemic Risk Control.
My conclusions
- If you go looking for ‘causes’ as a precursor to explicit or implicit ‘blame’ you can always find them but they will be vague and unfocussed.
- If you focus on actual physical processes that have resulted in the spotlight being shone in the first case you can quickly see how to learn from the incident and benefit. In this case, should Boeing be held responsible for the educational standards of engineers? What about the current state of the pendulum swing between engineer as CEO and accountant as CEO?
- Generalised ideas of an SMS should be kept away from the management of operational safety matters – being ones requiring technical knowledge to understand and control.
- Generalised ideas of what safety culture is should not be applied over the top of a pre-existing appropriate culture. I doubt that Prof. Reason would approve either.