Boeing identifies new software problem on grounded 737 Max (opens in new tab)

(bloomberg.com)

253 pointsrafaelm6y ago220 comments

220 comments

103 comments · 14 top-level

onychomys6y ago· 24 in thread

In their defense, probably every piece of software of any complexity at all has bugs waiting to be found, and it's not super surprising that they found some new ones while doing a rigorous testing regimen.

coldpie6y ago

In their defense, the software industry is a complete joke in terms of quality control. I hope that this and the spate of ransomware will wake the industry up to realize we need new processes, languages and tooling to make software provably correct. It's clear we can't use our existing languages and tooling to make high quality software. Realistically though, that's not going to happen. The smart move is to eliminate as much software from your life as possible. It's only going to get worse as decades of laziness catches up to us.

thawaway18376y ago

This defense basically says that software engineering is far more of a joke than hardware engineering.

But it’s not a real defense for the Max because the problem with the Max is that Boeing has shifted a lot of the work from hardware to software.

This defense only punts the ball a few yards because the question now is why Boeing chose to shift a lot of the reliability from hardware, which this defense admits is better engineered, to software, which it admits is almost intrinsically worse.

If anything, this defense leads to the conclusion that the Max is an intrinsically unsafer plane, because it has shifted far more of the burden to software which is an intrinsically worse engineering discipline.

If it was just a matter of the software QA being rushed then the additional time gained by the grounding of the planes could solve the problem. But if it is software engineering itself which is the problem then no additional time will help unless it leads to shifting some of the burden of safely flying the plane from software to hardware, which doesn’t even appear to be an option Boeing has considered.

2 more replies

m_fayer6y ago

Watching Android descend from a place of dorky stability to sleek sealed glitch-city, kinda makes me think it's the institutions, as opposed to the tooling, that's behind the plunge in the quality of mainstream software.

3 more replies

kjs36y ago

we need new processes, languages and tooling to make software provably correct

This is so not true.

Processes? Like T-CMM/TSM, FAA-iCMM, ISO 90003, TCSEC, TSP, etc., etc.? Languages? Like Ada, MISRA, Modula-2/IEC1131, Cyclone, etc., etc.? Tools? Like FRAMA-C, MALPAS, SPADE, SPARK, TLA+, etc., etc.?

The simple fact of the matter is that we've had all the processes, languages & tools to write high-reliability, secure code for decades. Go check out the CMU SEI for how long that one institution has been trying to get people to do the right thing.

The joke is that the software industry by and large wants to pretend we don't, and claim that there's some magical new tech that needs to get invented and when it arrives everyone will jump on it and suddenly software bugs will be a thing of the past. And every new tool, tech or whatever that shows up re-proves some unpleasant basic facts: writing secure, provable, reliable code is time consuming, usually difficult, which translates to relatively expensive. And the software industry doesn't want to hear that, or invest in it, and excuses it's miserable security record with "it's not us, the tech doesn't exist to do the job right".

As a sidenote, this is why I have a visceral distaste for the Rust people (the language seems fine). If they'd spent those (presumedly) thousands of man years building on and improving Ada instead of being all NIH averse we'd be much further down the road. They could have participated in the process and gotten their wish list included in the Ada-2012 standard and built on 35+ years of experience. But, hey, it's more fun to start from scratch every decade or so.

starfallg6y ago

Nah. Just introduce 'modern software development methodologies'. Put in a CI/CD pipeline and Blue/Green with Canary. Roll back the release on hull loss. /s

1 more reply

gmanley6y ago

Not all software has the same level of criticalness. The difference between a bug in an airplane vs one in a game or entertainment software is gigantic. Sometimes it really is more important to get the software out quicker vs it being bug free. Plus, formally provable software exists and is used in some mission critical applications including some Airbus software. It's just that it takes a lot longer to develop. These problems are more a business and process issue, as in Boeing prioritizing speed over quality.

SilasX6y ago

The "software industry in general", sure, but my understanding is that any flight-critical software goes under a much more rigorous testing regime and uses much better practices the average software project. It's probably why such issues were caught so early here.

3 more replies

salawat6y ago

>It's clear we can't use our existing languages and tooling to make high quality software.

Gotta fundamentally disagree with you right there. We absolutely can make high-quality software with our current tooling. The issue is that doing that is expensive and time consuming, and the Market optimizes on good enough to be sold and not dropped.

This expense and difficulty isn't an inherent fault of the tools, but rather the monstrous other side of the coin in proving what your system isn't.

There are many implementations that are composable to generate an end result, the trick is to expend the energy to ensure you've made the specific one that also doesn't run into undefined behavior, domain specific or otherwise.

You will never escape from the tyranny of having to clearly communicate to a perfectly obedient machine exactly what it is you want it to do; part of which is being able to identify when you don't have all your requirements right.

1 more reply

Pmop6y ago

Research on software reliability is plentiful and throughful, and there are existing tools to use such knowledge in a way to ensure safety of critical systems, from development to deploy. The thing is that they most likely ran their calculations and concluded that it's more profitable to cheap out on software development and QA, and pay for insurance or fines, should anything go wrong later. Engineers who devoted their time and effort to learn aforementioned techniques are very unlikely to work for 9$/h, it's more profitable and less stressful to go wash cars or whatever.

anshou-6y ago

The people, processes, languages, and tools aren't the root problem. The root problem is always time and money.

1 more reply

robocat6y ago

> In their defense, the software industry is a complete joke in terms of quality control.

My dishwasher stopped working today. Last week I had to replace a wheel bearing in my van ($900). I just got a refund for a pond aerator that stopped working after a few weeks. And the head came off my plastic toy.

Hardware fails all the time: does that mean I can generically say all manufacturing has shitty QC?

1 more reply

pinkfoot6y ago

In our defence, our customers are a compete joke in terms of the price they are prepared to pay for quality sfotware.

See MCAS programmers at $9/h.

2 more replies

reilly30006y ago

No new processes needed. ISO 90003 covers all kinds of software engineering practices that ensure quality and correctness. I wish more firms used what is already out there.

purpleidea6y ago

Provably correct software (and memory and type safe languages!) are both important aspects. But they can still cause you to die in a software bug. The third and most important aspect is that the code must be viewable by any third party or individual. Open source here would be ideal, but at a bare minimum, we should be allowed to see the code that our lives depend on.

simion3146y ago

>and it's not super surprising that they found some new ones while doing a rigorous testing regimen. reply

So they did not do enough testing before, just the happy case and never considered sensor failures. I am now wondering if they tested the rest of the software or FAA will just look into MCAS and ignore all the rest.

johannes12343216y ago

As a plane crash costs lives of a few hundred people the criteria should be different from "normal" software. And for most parts and most history software in aeronautics was created with special care and tooling and under research of provability.

organsnyder6y ago

In the case of MCAS, that's likely true: the software was behaving exactly per spec; the problem was in the earlier requirements-definition phase.

1 more reply

skissane6y ago

> it's not super surprising that they found some new ones while doing a rigorous testing regimen

Why didn’t they find these bugs during the plane’s development and certification? It appears the testing then was less rigorous. They should extend the same rigorous testing to their other aircraft models also.

totalZero6y ago

Shouldn't that rigorous testing regimen have preceded the sale and flight of these large passenger aircraft?

kayfox6y ago

A regime which finds all the defects would cost billions on its own, only one such piece of software has been subject to such a regime and that's the Space Shuttle guidance program. Every other piece of software is tested to levels set by the aeronautical industry and regulators, which may not include all possible scenarios.

You will note, some of these scenarios are essentially fuzzing the memory of the flight computer and seeing what happens, ideally most bit flips will either be detected or be minor, but some can end up causing issues. I'm not sure if it would be possible to explore all the branches in the software in any sort of reasonable amount of time.

https://en.wikipedia.org/wiki/Qantas_Flight_72

3 more replies

onychomys6y ago

Sure. But I'd hope that they brought in new people, with a new outlook on things, and started all over from scratch. They're probably running mostly the same tests they were before the crashes, and also some new ones that people have dreamed up.

1 more reply

nradov6y ago

The Space Shuttle flight computer software was effectively bug free. Only a tiny number of defects were found post release, and none impacted safety.

Findeton6y ago

Actually there are two types of software failures: design failures and implementation failures. Design failures are harder, but we already have ways of completely avoiding implementation failures, with systems that prove mathematically that the software implements the design. NASA for example uses those in many occasions. Boeing doesn't, obviously, and there's no valid excuse.

anshou-6y ago

When some software is being built the constraints are much more rigorous, but it all comes down to the organization. The guidelines established by NASA for their space program are a good example of rigorous requirements. Boeing should not be unfamiliar with this concept and can certainly afford to do this properly. This move was motivated purely by greed.

bsimpson6y ago· 13 in thread

I'm shocked they keep publicly working on this plane.

I get that planes cost more money than I can fathom, and that making a whole fleet of impossible amounts of money costs a gazillion dollars. Still, this one seems spent. Nobody is going to knowingly fly on a 737 Max.

They ought to have retired the plane last year. They can design a new plane (that because of economics, will probably be very similar to this plane), release it when it's been properly vetted, swap out Maxes for it, retrofit those Maxes, etc.

I realize this is naive armchair quarterbacking from someone who has never worked in aviation, but there's a reason that Philip Morris is called Altria now and that Weinstein Co was merged into Spyglass. If the public doesn't trust your brand, no amount of "but we fixed it with this patch we rushed out the door" is going to change that.

yellow_lead6y ago

> Nobody is going to knowingly fly on a 737 Max.

I will respectfully disagree with you. Most airlines won't tell you your aircraft when you book a ticket and even if they do, they seem allowed to change it at the last minute. If you have spent $500,$1000, $XXXX on a ticket, will you not board if you discover the plane has been switched? Will you avoid airlines that don't guarantee you a certain plane?

There are not many great options to travel long distances quickly. If this plane becomes commonplace, I'm afraid consumers will be forced to use it. If there are other ideas on how this could play out, I'm happy to consider them, but I fear this is nearly a given.

thawaway18376y ago

I wouldn’t be surprised if ticket booking sites didn’t throw up a “this could potentially be a 737 Max plane” indicator the moment these planes are ungrounded.

Because the first ticketing site that did that would gain a definite edge. And even if the airline doesn’t tell you what plane is flying, we know what routes and airlines are flying 737 Max’s, which is enough to raise an indicator. If that happens, flying a 737 Max may potentially cause airlines to lose money even if the particular flight isn’t a 737 Max, because it might be one.

blt6y ago

> Most airlines won't tell you your aircraft when you book a ticket

I don't think this is remotely true. I just checked Delta and British Airways; both show the aircraft type in "details". (Pick the 747 flights from BA while you still can!)

> even if they do, they seem allowed to change it at the last minute

Airlines shuffle aircraft occasionally, but usually the aircraft type for a particular flight is predictable. If airline A flies the 737 Max and airline B flies the A320 on the same route, people will flock to airline B.

2 more replies

oska6y ago

It's quite easy to look up which airlines use this aircraft and which don't.[1] And further, which airlines are major purchasers of this aircraft (e.g. Southwest, Flydubai and Lion Air). And then avoid flying those airlines.

[1] https://en.wikipedia.org/wiki/List_of_Boeing_737_MAX_orders_...

kayfox6y ago

My experience is most airlines in the US will tell you on booking, its just not put out in the open.

2 more replies

JumpCrisscross6y ago

> They ought to have retired the plane last year

This would bankrupt the company.

(Not necessarily uncalled for. But not something it will do on its own.)

> If the public doesn't trust your brand

The public has shown one preference, above all else, when it comes to flying: pricing. The 737 MAX will be renamed, re-certified and nobody but an obsessive minority will avoid flying on it.

(This is why, btw, we need strong airline regulators. Market pressure is ceteris paribus insufficient.)

TeMPOraL6y ago

> This would bankrupt the company.

Is that even possible? From what I understand, it's a strategic company for the United States, so they have to keep it alive no matter what (I recall reading that it's actually a law) - or at least its military branch.

2 more replies

TylerE6y ago

Do you really want to re-regulate the airlines? Fares will at least double.

5 more replies

masswerk6y ago

Mind that the 737 Max is all about avoiding recertification (or grandfathering) and avoiding pilot retraining. Its raison d'être is to allow airlines with an existing fleet of 737s to operate with existing crews without further training and/or certification. Meaning, designing a new plane allowed operators to switch to Airbus, as well, which might be the more attractive option at the moment.

This is all about operator logistics and lock-in by costs of retraining and infrastructure. (I guess, this may be the real point to be addressed, since, as this is such a crucial factor, procedures are prone to be repeated in other configurations in the future. Failures are highly likely to be repeated, regardless of the actor.)

Edit: To emphasize the last thought, the 737 Max may be more of a systemic failure of the entire business, its regulations and how they are conducted, than a failure by a single actor on a single instance.

alkonaut6y ago

I’ll probably fly it. As in, I wouldn’t pay an unlimited amount of time or money to avoid it, and it will be expensive (e.g a 10 hour drive instead of a one hour flight).

I assure you that outside of people interested in tech, few I know have any idea what the MAX8 is, how to tell one from the NG or an A320. Once the plane flies it’s going to be business as usual.

rootusrootus6y ago

I am willing to bet that most people do not care enough to fight it. They want cheap, everything else is secondary. They will rationalize that all of this testing has made the MAX arguably the safest airliner they can fly in, and step aboard they will.

Most people are not well represented by the fine folks on HN.

catalogia6y ago

It wouldn't shock me to learn that the majority of people stepping off a commercial airliner, if interviewed, couldn't tell you whether they'd just been on a 737 or a Tu-204. It would be like asking me what model of bus my city uses, I haven't the foggiest clue.

mobjack6y ago

If it flies for a few months without incident, most people will stop noticing.

After a few years, no one will care anymore.

sheeshkebab6y ago· 11 in thread

There are two kinds of software:

- buggy

- the one where bugs were not yet found

gibbonsrcool6y ago

- formally verified software

I only remember hearing about "mathematically proven software" as an undergrad and just googled to find this name. I've always been interested in learning more of what it's about but never jumped in.

aidenn06y ago

Formally verified software still has its limitations. Knuth's famous "Beware of bugs in the above code; I have only proved it correct, not tried it." is funny but true.

Formal verification is a good and useful tool, but it provably cannot cover the entire system, and practical limitations will limit it even further.

Formal verification of source code is still subject to compiler bugs. Formally proven compilers are subject to bugs in the larger system (IIRC Csmith was able to find an incorrectness in code generated by CompCert because of a bug in a system header file).

1 more reply

quickthrower26y ago

Formally verified gets rid of a class of bugs, but you can still have a bad specification. Also what is a bug might change in the future. I imagine as we learn more from plane crashes what is ok today might be a big tomorrow.

sixstringtheory6y ago

Check this out: https://news.ycombinator.com/item?id=22082869

How Amazon Web Services Uses Formal Methods

1 more reply

lrem6y ago

Think of it as writing the software in a very barebones language and then requiring unit test coverage of all possible input combinations, asserting the full output. A lot of work, but gives you reasonable certainty that your code is indeed correct (does what it was designed to do). That's at least what I learned in that one masters course. After implementing the final assignment, which was equivalent to some three lines of C and took a team of four "the semester", I've decided to not look any further into it. It is being used in applications that warrant actual investment into bug-free code, say nuclear reactor control or helicopter rotor control.

I would imagine the MCAS belongs to this class. But even if your software is correct, the design it implements might be flawed, say by assuming the input you get from a single fallible sensor is to be trusted.

goto116y ago

Formal verification just proves that the software conforms to some specification. How do you proved the spec has no mistakes?

opwieurposiu6y ago

- software with off-by-one errors

noonespecial6y ago

There's a fourth one but its undefined.

1 more reply

rosybox6y ago

my hello world has no bugs!

mirekrusin6y ago

Is it enterprise ready like this one https://gist.github.com/lolzballs/2152bc0f31ee0286b722 ? Even that one is alpha quality, no soap support, no saml2.0, xml is not even mentioned once. It needs couple of major versions before it can reach production in express, standard, developer, enterprise and pro distributions.

IMTDb6y ago

But the kernel and/or the CPU you are running it on does.

kayfox6y ago· 8 in thread

I think the main thing we are seeing here is hundreds of smaller fixes that usually form the steady stream of Airworthiness Directives that an aircraft currently supported by the manufacturer sees turning into a news event every single time one comes out.

So far only one "aircraft" has had perfect software, and that was the Space Shuttle, every single other aircraft out there has had software issues that are worked out over the life of the aircraft, just like every piece of software, even that which has very strict testing regimes, has had defects in it.

joncp6y ago

That's just a scale issue. The Space Shuttle only flew 135 times, so those one-in-a-million corner cases never really had a chance to happen. If it were to fly millions of missions like the 737 fleet has, then bugs would surface for sure.

kayfox6y ago

The software quality of the space shuttle is much higher than commercial aircraft.

https://www.fastcompany.com/28121/they-write-right-stuff

blattimwind6y ago

> So far only one "aircraft" has had perfect software, and that was the Space Shuttle

Actually it had 3+ known bugs.

rootbear6y ago

And shortly after the code for the Apollo Guidance Computer was put up on Github, someone found a bug!

kayfox6y ago

Drat, we have no known examples of perfect software.

1 more reply

0xffff26y ago

True enough for modern commercial airliners; I can't help but point out that plenty of aircraft have "perfect" (read: non-existent) software. Those aircraft too generally have issues that are worked out over their lifetime. Software seems to be especially error prone, but maybe that's just because the mechanical engineers have a head start of several hundred years.

kayfox6y ago

Honestly the mechanical engineering is also error prone, its just got margins that make it hard to mess up. There are instances where it does mess up, like the long list of cargo door issues through the 80s, the 737 rudder issue (as well as lots of other less famous hydraulic servo issues), and other issues.

bronco210166y ago

That’s what I love about the older aircraft that are more mechanical. When it’s broke it’s obvious it’s broke. The mechanic comes out and can physically identify a part that’s broken and replace it. Problem solved and on you go.

With all of these aircraft that are so computer reliant it becomes this magic box that is nearly impossible to diagnose and fix quickly. You do the circuit breaker reset, then reset the whole jet, then check the connectors of the components of the system, then change some of the computers/controllers, all the while checking for any fault code that might lead you down the right path.

This process often takes 30-60 minutes by which time you’re boarded and ready to go and if it’s not fixed by then it turns into getting everyone off the aircraft and finding a different aircraft so the broken ship can be taken to the shop and a through investigation of the issue can be done.

Meanwhile the customers riding in the MD88 already had their mechanical part replaced and they’re on their way, none the wiser because the mechanic got it diagnosed and replaced before boarding was even done.

MDWolinski6y ago· 7 in thread

All software is buggy. The problem with the MCAS system is that pilots were not informed that it was there, nor were they given a way to override it and take full control of the airplane. Also, while the MCAS system relied on two sensors, if either failed, the MCAS system itself failed, so there was no built in back-up for it.

Bugs in software happen because situations where they arise are sometimes hard to predict. You can test your software all you want but it's not until it's in the field that you start discovering new issues because people tend to do things in ways developers didn't consider.

Tesla's software has over a billion miles of data on it and it still has issues in some basic functionality. And let's not talk about Iowa which in itself was a major failure in software release management.

Stierlitz6y ago

@MDWolinski “.. while the MCAS system relied on two sensors ..”

MCAS used only the one sensor, this decision made so as to avoid recertification.

http://www.b737.org.uk/mcas.htm

“Are we vulnerable to single AOA sensor failures with the MCAS implementation or is there some checking that occurs?”

https://www.aviationtoday.com/2019/11/02/boeing-ceo-outlines...

mnm16y ago

Not all software is buggy to the point of killing almost four hundred people. Comparing some shit app some interns built for Iowa with avionics software is frankly insulting to the people who work hard to make avionics software. The same goes for Tesla. The avionics industry, including Boeing, used to have a great record in this area. Even if the mcas bugs were unavoidable, the fact still is that the design was fatally flawed due to either sensor being a single point of failure. And of course, the main problem that the whole, entire airplane is unstable in the air. How can you still make excuses for Boeing at this point in time? The only reason this bug should be irrelevant is because this plane should never carry another commercial passenger. But I'm sure profits will prevail over lives once again starting this summer or whenever the FAA gives their go ahead.

catalogia6y ago

> And of course, the main problem that the whole, entire airplane is unstable in the air.

That's not really true. The airframe is fine, except it doesn't handle like a 737. MCAS was meant to make the MAX handle like a 737.

Mentour Pilot, a 737 instructor with a youtube channel, has covered this fairly extensively: https://youtu.be/TlinocVHpzk?t=951

1 more reply

m4rtink6y ago

There is one specific Iowa which (if it was still in service) could become very deadly if it was using a buggy app.

pfundstein6y ago

> All software is buggy.

Assuming that's not hyperbole and just to be pedantic:

  mov ax,cs
  mov ds,ax
  mov ah,9
  mov dx, offset Hello
  int 21h
  xor ax,ax
  int 21h
  
  Hello:
    db "Hello World!",13,10,"$"

tantalor6y ago

Bug report for you...

Expected: "Hello, World!"

Actual: "Hello World!" (missing comma)

See spec: https://en.wikipedia.org/wiki/%22Hello,_World!%22_program

1 more reply

kalium_xyz6y ago

Sad thing we have little insight on microcode and the exact way machine code gets interpreted, for all we know there might still be a resulting bug from any step asm to execution.

clSTophEjUdRanu6y ago· 7 in thread

As a former software defense worker, I wish there were 3rd party audits of code and dev ops. If you saw the code that's flying in missiles, aircraft, etc and how they got there youd want to go live in a cave.

jacquesm6y ago

Some whistleblower should one day post an archive of Airbus or Boeing's software archives. That would make for interesting reading.

Glawen6y ago

It's usually worthless without knowing what is attached to the input/output of the microcontroller. A lot of things are done ecternally on the wiring.

kayfox6y ago

They have missiles for those caves. ;)

bradknowles6y ago

It doesn't have to be a missile.

Any fuel-air explosive will do.

pnako6y ago

A bug in a plane can make it crash in a fireball. But a bug in a missile is something that would make it NOT crash in a fireball.

Thus the obvious solution to quality problems is to switch missile software engineers and aircraft software engineers, and encourage them not to care about quality.

monocasa6y ago

Or make it crash into a fireball literally anywhere except where it was supposed to.

V_Terranova_Jr6y ago

Seconded.

stagas6y ago· 6 in thread

Can someone explain how a hugely complex machine with mostly parallel working analog parts fits into the digital computing paradigm? Isn't it predetermined to fail under extreme conditions, like those that are found while flying inbetween clouds and thunderstorms with all that pressure and fluctuations? How does sampling not fail, like, all the time? What kind of tooling is being used to mitigate for all these? Does anyone know?

garbage_885646y ago

I am very naive to commercial aviation but this is my experience with building and crashing model aircraft repeatedly. I fly mostly FPV which puts me in the first person view from the cockpit.

Yes, electronics fail in the most weirdest ways due to connector failures, RF interference, software error, sensor failure.

When my systems start failing or acting up due to improper stabilization PID gains, etc. I have a big switch for MANUAL mode. I am able to fly this thing as long as the servos, radio, and camera get power. All sensors could be sheared off. I have no idea what my airspeed is ever because I don't use pitot tubes so I use a known engine throttle % whose stall characteristic I understand for level flight in various wind conditions and I don't make sudden maneuvers at throttle below this point.

Fixed wing planes have remarkable aerodynamic stability and I don't understand why 737 MAX cannot be piloted in a fly by wire manner with all computer aids disabled, giving the pilots direct control of the servos with a big red switch that mechanically disconnects the flight computers. This requires almost no code to implement.

jaywalk6y ago

On Boeing aircraft, the pilots essentially do have "direct control of the servos" at all times. MCAS was implemented to make the MAX fly just like the NG despite the difference in engine size and placement. What MCAS actually did was not modifying the pilots inputs, but adjusting the stabilizer trim in certain scenarios.

The pilots do have direct control over the stabilizer trim, and have always had the ability to disable the electronic system in case of stabilizer trim runaway. This was not new to the MAX, and would have effectively disabled MCAS.

2 more replies

pdonis6y ago

> Fixed wing planes have remarkable aerodynamic stability and I don't understand why 737 MAX cannot be piloted in a fly by wire manner with all computer aids disabled

It can be. MCAS can be disabled by disabling the electric stability trim system. The problem is that if you do that in a situation where MCAS has already adjusted the trim far enough from where it should be, it can be mechanically impossible to put the trim back where it belongs without using the electric trim system. So you have to first use the electric trim system to put the trim back where it belongs, then disable it so MCAS can't mess it up again.

starpilot6y ago

Fly by wire = flying with computational assistance, not literally pulling wires, since control surfaces are hydraulically actuated.

salawat6y ago

https://en.wikipedia.org/wiki/Failure_mode_and_effects_analy...

https://en.wikipedia.org/wiki/Fault_tree_analysis

Basically, you should be designing every system to gracefully handle the failure of every other system on which it is dependent.

So the MCAS routines, if they had been done correctly, and properly classified as to the hazard level, should have taken into account failures of the Flight Computer they were running on, anomaly detection via cross-check with the second AoA vane, etc. That quite clearly did not happen.

The same approach applies with any other hardware/software integration. Your sensors will break. You therefore need to determine what you need to do when that happens.

starpilot6y ago

Yes.

vikramkr6y ago· 4 in thread

So not only are they trying to fix a fundamental hardware issue with a software patch, their inability to do software properly extends beyond just their MCAS system? This is a good reminder that air travel's extraordinary safety record isn't just a given, it's something that takes real work to achieve and when the people responsible for putting in that work (Boeing, regulators) begin taking safety for granted, that's when people die.

totalZero6y ago

I agree. But we should keep in mind that there's no such thing as a bad apple; we can't blame individual executives or regulators.

It's a bad barrel: a company that has, on a cultural level, put its business motive above its responsibility to deliver a safe and high-quality product. We have seen documented evidence that employees knew there were dangers and problems, and discussed these issues, but nobody cared enough to slow things down and get the product right.

vikramkr6y ago

Absolutely. That's why I said Boeing and the FAA failed their responsibility as opposed to a Boeing exec or a particular legislator - there are organizational, structural problems. Sure, some individuals made the decision to ignore reports or set a new culture, but the fact that they succeeded is concerning - why did everyone else enable them? Is there anything we could have done to encourage those engineers to whistleblow their concerns before the planes crashed? Would they have been taken seriously by the FAA or the media or investors who were pushing for growth at all costs? These are deep, structural problems.

1 more reply

_ea1k6y ago

It isn't really a fundamental hardware issue, its a fundamental issue with trying to work around the training requirements that should come along with a new airframe.

I suspect that a thorough review of some of the more complex Airbus airframes currently in operation would result in some similarly scary findings, tbh.

artursapek6y ago

Complacency kills. This is particularly true in aviation.

swiley6y ago· 3 in thread

I really wonder about these large engineering corporations, Toyota seems to have similar problems with software.

Part of me feels like many of these companies don’t keep code secret to protect IP, instead they do it because they know it’s a burning train wreck and don’t want people to find out.

salawat6y ago

That's interesting. Toyota would be one of the last companies I'd expect to hear that about. They're notorious in Quality circles for taking Quality seriously; at least as far as their production line is concerned. Do they not apply that same philosophy to in house software?

mark-r6y ago

The investigations carried out for unintended acceleration in Toyotas didn't paint a good picture.

https://www.safetyresearch.net/blog/articles/toyota-unintend... https://users.ece.cmu.edu/~koopman/pubs/koopman14_toyota_ua_...

3 more replies

thanatropism6y ago

The Toyota / Arthur Deming quality philosophy is really applicable to repeatable process where quality control means detecting abnormal variation amidst normal variation.

benwerd6y ago· 3 in thread

There is a less than zero chance I'll be boarding one of these planes again, ever. Trust is an important idea in any product, but particularly in areas like aviation, and I don't see how they can possibly build it back.

I do see an opportunity for software that ensures you are only booking journeys on the aircraft you feel are safe.

oska6y ago

I agree with you. This plane is fundamentally flawed and the reason such a flawed plane went into production was to reward short-term, sociopathic thinking and earn short-term profits (which ended up blowing up in their faces anyway). Taking a flight on this plane now would, for me, be like going back to an abusive partner after they'd thrown me down a flight of stairs.

apexalpha6y ago

on the other hand, the software for this plane might be the most scrutinized ever when all this is over.

oska6y ago

The hardware remains fundamentally flawed.

platz6y ago· 2 in thread

> designed to warn of a malfunction by a system that helps raise and lower the plane’s nose

So, they can't even name the mcas system anymore?

slumdev6y ago

I thought that's what the MCAS was. Unless there's more than one system that overrides the pilot to pitch the nose down?

kayfox6y ago

Speed trim and the trim system in general.

djsumdog6y ago· 1 in thread

I hope these planes get scrapped and never see flight again. Tear them down and recycle the parts, and build something that's modern from the group up instead of recycled, deprecated bullshit.

V_Terranova_Jr6y ago

This just isn't going to happen.

frandroid6y ago

> Asked about a likely date for a return to service for the Max, Dickson said it isn’t helpful to talk about timelines. Boeing needs to concentrate on making complete, quality submissions on its fixes for the plane, he said.

Ahh, "we'll ship it when it's ready, not on some arbitrary deadline." Music to any engineer/builder's ears.

notadoc6y ago

Throw in the towel on the 737 Max and go back to the drawing board.

j / k navigate · click thread line to collapse

220 comments

103 comments · 14 top-level

onychomys6y ago· 24 in thread

coldpie6y ago

thawaway18376y ago

This defense basically says that software engineering is far more of a joke than hardware engineering.

But it’s not a real defense for the Max because the problem with the Max is that Boeing has shifted a lot of the work from hardware to software.

2 more replies

m_fayer6y ago

3 more replies

kjs36y ago

we need new processes, languages and tooling to make software provably correct

This is so not true.

Processes? Like T-CMM/TSM, FAA-iCMM, ISO 90003, TCSEC, TSP, etc., etc.? Languages? Like Ada, MISRA, Modula-2/IEC1131, Cyclone, etc., etc.? Tools? Like FRAMA-C, MALPAS, SPADE, SPARK, TLA+, etc., etc.?

starfallg6y ago

Nah. Just introduce 'modern software development methodologies'. Put in a CI/CD pipeline and Blue/Green with Canary. Roll back the release on hull loss. /s

1 more reply

gmanley6y ago

SilasX6y ago

3 more replies

salawat6y ago

>It's clear we can't use our existing languages and tooling to make high quality software.

This expense and difficulty isn't an inherent fault of the tools, but rather the monstrous other side of the coin in proving what your system isn't.

1 more reply

Pmop6y ago

anshou-6y ago

The people, processes, languages, and tools aren't the root problem. The root problem is always time and money.

1 more reply

robocat6y ago

> In their defense, the software industry is a complete joke in terms of quality control.

Hardware fails all the time: does that mean I can generically say all manufacturing has shitty QC?

1 more reply

pinkfoot6y ago

In our defence, our customers are a compete joke in terms of the price they are prepared to pay for quality sfotware.

See MCAS programmers at $9/h.

2 more replies

reilly30006y ago

No new processes needed. ISO 90003 covers all kinds of software engineering practices that ensure quality and correctness. I wish more firms used what is already out there.

purpleidea6y ago

simion3146y ago

>and it's not super surprising that they found some new ones while doing a rigorous testing regimen. reply

johannes12343216y ago

organsnyder6y ago

In the case of MCAS, that's likely true: the software was behaving exactly per spec; the problem was in the earlier requirements-definition phase.

1 more reply

skissane6y ago

> it's not super surprising that they found some new ones while doing a rigorous testing regimen

totalZero6y ago

Shouldn't that rigorous testing regimen have preceded the sale and flight of these large passenger aircraft?

kayfox6y ago

https://en.wikipedia.org/wiki/Qantas_Flight_72

3 more replies

onychomys6y ago

1 more reply

nradov6y ago

The Space Shuttle flight computer software was effectively bug free. Only a tiny number of defects were found post release, and none impacted safety.

Findeton6y ago

anshou-6y ago

bsimpson6y ago· 13 in thread

I'm shocked they keep publicly working on this plane.

yellow_lead6y ago

> Nobody is going to knowingly fly on a 737 Max.

thawaway18376y ago

I wouldn’t be surprised if ticket booking sites didn’t throw up a “this could potentially be a 737 Max plane” indicator the moment these planes are ungrounded.

blt6y ago

> Most airlines won't tell you your aircraft when you book a ticket

I don't think this is remotely true. I just checked Delta and British Airways; both show the aircraft type in "details". (Pick the 747 flights from BA while you still can!)

> even if they do, they seem allowed to change it at the last minute

2 more replies

oska6y ago

[1] https://en.wikipedia.org/wiki/List_of_Boeing_737_MAX_orders_...

kayfox6y ago

My experience is most airlines in the US will tell you on booking, its just not put out in the open.

2 more replies

JumpCrisscross6y ago

> They ought to have retired the plane last year

This would bankrupt the company.

(Not necessarily uncalled for. But not something it will do on its own.)

> If the public doesn't trust your brand

The public has shown one preference, above all else, when it comes to flying: pricing. The 737 MAX will be renamed, re-certified and nobody but an obsessive minority will avoid flying on it.

(This is why, btw, we need strong airline regulators. Market pressure is ceteris paribus insufficient.)

TeMPOraL6y ago

> This would bankrupt the company.

2 more replies

TylerE6y ago

Do you really want to re-regulate the airlines? Fares will at least double.

5 more replies

masswerk6y ago

alkonaut6y ago

I’ll probably fly it. As in, I wouldn’t pay an unlimited amount of time or money to avoid it, and it will be expensive (e.g a 10 hour drive instead of a one hour flight).

I assure you that outside of people interested in tech, few I know have any idea what the MAX8 is, how to tell one from the NG or an A320. Once the plane flies it’s going to be business as usual.

rootusrootus6y ago

Most people are not well represented by the fine folks on HN.

catalogia6y ago

mobjack6y ago

If it flies for a few months without incident, most people will stop noticing.

After a few years, no one will care anymore.

sheeshkebab6y ago· 11 in thread

There are two kinds of software:

- buggy

- the one where bugs were not yet found

gibbonsrcool6y ago

- formally verified software

I only remember hearing about "mathematically proven software" as an undergrad and just googled to find this name. I've always been interested in learning more of what it's about but never jumped in.

aidenn06y ago

Formally verified software still has its limitations. Knuth's famous "Beware of bugs in the above code; I have only proved it correct, not tried it." is funny but true.

Formal verification is a good and useful tool, but it provably cannot cover the entire system, and practical limitations will limit it even further.

1 more reply

quickthrower26y ago

sixstringtheory6y ago

Check this out: https://news.ycombinator.com/item?id=22082869

How Amazon Web Services Uses Formal Methods

1 more reply

lrem6y ago

goto116y ago

Formal verification just proves that the software conforms to some specification. How do you proved the spec has no mistakes?

opwieurposiu6y ago

- software with off-by-one errors

noonespecial6y ago

There's a fourth one but its undefined.

1 more reply

rosybox6y ago

my hello world has no bugs!

mirekrusin6y ago

IMTDb6y ago

But the kernel and/or the CPU you are running it on does.

kayfox6y ago· 8 in thread

joncp6y ago

kayfox6y ago

The software quality of the space shuttle is much higher than commercial aircraft.

https://www.fastcompany.com/28121/they-write-right-stuff

blattimwind6y ago

> So far only one "aircraft" has had perfect software, and that was the Space Shuttle

Actually it had 3+ known bugs.

rootbear6y ago

And shortly after the code for the Apollo Guidance Computer was put up on Github, someone found a bug!

kayfox6y ago

Drat, we have no known examples of perfect software.

1 more reply

0xffff26y ago

kayfox6y ago

bronco210166y ago

MDWolinski6y ago· 7 in thread

Stierlitz6y ago

@MDWolinski “.. while the MCAS system relied on two sensors ..”

MCAS used only the one sensor, this decision made so as to avoid recertification.

http://www.b737.org.uk/mcas.htm

“Are we vulnerable to single AOA sensor failures with the MCAS implementation or is there some checking that occurs?”

https://www.aviationtoday.com/2019/11/02/boeing-ceo-outlines...

mnm16y ago

catalogia6y ago

> And of course, the main problem that the whole, entire airplane is unstable in the air.

That's not really true. The airframe is fine, except it doesn't handle like a 737. MCAS was meant to make the MAX handle like a 737.

Mentour Pilot, a 737 instructor with a youtube channel, has covered this fairly extensively: https://youtu.be/TlinocVHpzk?t=951

1 more reply

m4rtink6y ago

There is one specific Iowa which (if it was still in service) could become very deadly if it was using a buggy app.

pfundstein6y ago

> All software is buggy.

Assuming that's not hyperbole and just to be pedantic:

  mov ax,cs
  mov ds,ax
  mov ah,9
  mov dx, offset Hello
  int 21h
  xor ax,ax
  int 21h
  
  Hello:
    db "Hello World!",13,10,"$"

tantalor6y ago

Bug report for you...

Expected: "Hello, World!"

Actual: "Hello World!" (missing comma)

See spec: https://en.wikipedia.org/wiki/%22Hello,_World!%22_program

1 more reply

kalium_xyz6y ago

Sad thing we have little insight on microcode and the exact way machine code gets interpreted, for all we know there might still be a resulting bug from any step asm to execution.

clSTophEjUdRanu6y ago· 7 in thread

jacquesm6y ago

Some whistleblower should one day post an archive of Airbus or Boeing's software archives. That would make for interesting reading.

Glawen6y ago

It's usually worthless without knowing what is attached to the input/output of the microcontroller. A lot of things are done ecternally on the wiring.

kayfox6y ago

They have missiles for those caves. ;)

bradknowles6y ago

It doesn't have to be a missile.

Any fuel-air explosive will do.

pnako6y ago

A bug in a plane can make it crash in a fireball. But a bug in a missile is something that would make it NOT crash in a fireball.

Thus the obvious solution to quality problems is to switch missile software engineers and aircraft software engineers, and encourage them not to care about quality.

monocasa6y ago

Or make it crash into a fireball literally anywhere except where it was supposed to.

V_Terranova_Jr6y ago

Seconded.

stagas6y ago· 6 in thread

garbage_885646y ago

I am very naive to commercial aviation but this is my experience with building and crashing model aircraft repeatedly. I fly mostly FPV which puts me in the first person view from the cockpit.

Yes, electronics fail in the most weirdest ways due to connector failures, RF interference, software error, sensor failure.

jaywalk6y ago

2 more replies

pdonis6y ago

> Fixed wing planes have remarkable aerodynamic stability and I don't understand why 737 MAX cannot be piloted in a fly by wire manner with all computer aids disabled

starpilot6y ago

Fly by wire = flying with computational assistance, not literally pulling wires, since control surfaces are hydraulically actuated.

salawat6y ago

https://en.wikipedia.org/wiki/Failure_mode_and_effects_analy...

https://en.wikipedia.org/wiki/Fault_tree_analysis

Basically, you should be designing every system to gracefully handle the failure of every other system on which it is dependent.

The same approach applies with any other hardware/software integration. Your sensors will break. You therefore need to determine what you need to do when that happens.

starpilot6y ago

Yes.

vikramkr6y ago· 4 in thread

totalZero6y ago

I agree. But we should keep in mind that there's no such thing as a bad apple; we can't blame individual executives or regulators.

vikramkr6y ago

1 more reply

_ea1k6y ago

It isn't really a fundamental hardware issue, its a fundamental issue with trying to work around the training requirements that should come along with a new airframe.

I suspect that a thorough review of some of the more complex Airbus airframes currently in operation would result in some similarly scary findings, tbh.

artursapek6y ago

Complacency kills. This is particularly true in aviation.

swiley6y ago· 3 in thread

I really wonder about these large engineering corporations, Toyota seems to have similar problems with software.

Part of me feels like many of these companies don’t keep code secret to protect IP, instead they do it because they know it’s a burning train wreck and don’t want people to find out.

salawat6y ago

mark-r6y ago

The investigations carried out for unintended acceleration in Toyotas didn't paint a good picture.

https://www.safetyresearch.net/blog/articles/toyota-unintend... https://users.ece.cmu.edu/~koopman/pubs/koopman14_toyota_ua_...

3 more replies

thanatropism6y ago

The Toyota / Arthur Deming quality philosophy is really applicable to repeatable process where quality control means detecting abnormal variation amidst normal variation.

benwerd6y ago· 3 in thread

I do see an opportunity for software that ensures you are only booking journeys on the aircraft you feel are safe.

oska6y ago

apexalpha6y ago

on the other hand, the software for this plane might be the most scrutinized ever when all this is over.

oska6y ago

The hardware remains fundamentally flawed.

platz6y ago· 2 in thread

> designed to warn of a malfunction by a system that helps raise and lower the plane’s nose

So, they can't even name the mcas system anymore?

slumdev6y ago

I thought that's what the MCAS was. Unless there's more than one system that overrides the pilot to pitch the nose down?

kayfox6y ago

Speed trim and the trim system in general.

djsumdog6y ago· 1 in thread

I hope these planes get scrapped and never see flight again. Tear them down and recycle the parts, and build something that's modern from the group up instead of recycled, deprecated bullshit.

V_Terranova_Jr6y ago

This just isn't going to happen.

frandroid6y ago

Ahh, "we'll ship it when it's ready, not on some arbitrary deadline." Music to any engineer/builder's ears.

notadoc6y ago

Throw in the towel on the 737 Max and go back to the drawing board.

j / k navigate · click thread line to collapse