The Therac-25 Incident (2021) (opens in new tab)

(thedailywtf.com)

449 pointslemper10mo ago292 comments

292 comments

210 comments · 45 top-level

benrutter10mo ago· 52 in thread

> software quality doesn't appear because you have good developers. It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.

If you only take one thing away from this article, it should be this one! The Therac-25 incident is a horrifying and important part of software history, it's really easy to think type-systems, unit-testing and defensive-coding can solve all software problems. They definitely can help a lot, but the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.

There was a great Cautionary Tales podcast about the device recently[0], one thing mentioned was that, even aside from the catasrophic accidents, Therac-25 machines were routinely seen by users to show unexplained errors, but these issues never made it to the desk of someone who might fix it.

[0] https://timharford.com/2025/07/cautionary-tales-captain-kirk...

WalterBright10mo ago

I'm going to disagree.

I have years of experience at Boeing designing aircraft parts. The guiding principle is that no single failure should cause an accident.

The way to accomplish this is not "write quality software", nor is it "test the software thoroughly". The idea is "assume the software does the worst possible thing. Then make sure that there's an independent system that will prevent that worst case."

For the Therac-25, that means a detector of the amount of radiation being generated, which will cut it off if it exceeds a safe value. I'd also add that the radiation generator be physically incapable of generating excessive radiation.

vjvjvjvjghv10mo ago

In general I agree but there is bit more complexity. I work in medical devices and there are plenty of situations where a certain output is ok in some circumstance but deadly in another. That makes a stopgap a little more tricky.

I agree with the previous poster that the feedback from the field is lacking a lot. A lot of doctors don’t report problems back because they are used to bad interfaces. And then the feedback gets filtered through several layers of sales reps and product management. So a lot of info gets lost and fixes that could be simple won’t get done.

In general when you work in medical you are so overwhelmed by documentation and regulation that there isn’t much time left to do proper engineering. The FDA mostly looks at documentation done right and less at product done right.

2 more replies

philjohn10mo ago

This.

One of the biggest things I see in junior engineers that I mentor (working in backend high throughput, low latency, distributed systems) is not working out all of the various failure modes your system will likely encounter.

Network partitions, primary database outage, caching layer outage, increased latency ... all of these things can throw a spanner in the works, but until you've experienced them (or had a strong mentor guide you) it's all abstract and difficult to see when the happy path is right there.

I've recently entirely re-architected a critical component, and part of this was defense in depth. Stuff is going to go wrong, so having a second or even third line of defense is important.

technofiend10mo ago

I recently had to argue a junior into leaving the health check frequency alone on an ECS container: the regular log entries annoyed her and she didn't know how to filter logs, so her solution was to take healthchecks down to every five minutes, as just one example of trying to talk to people about the unhappy path.

1 more reply

layman5110mo ago

I might also add that apparently, older versions of the machine had physical “hardware interlocks” that would make accidents less likely no matter what the software was doing. So the older software was probably just thought to be reliable, but it had a physical mechanism that was helping it to not kill someone. On a less serious note that’s part of why car doors might still have keyholes even if normally they open in a fancy way with electronic fobs.

Cthulhu_9mo ago

Great point. Earlier in my career, and I think many can see it too, I was very diligent; thorough types, unit tests, defensive programming, assertions at one point, the works.

But this opens up a can of worms, as suddenly you have to deal with every edge case, test for every possible input, etc. This was before fuzz testing, too. Each line of defensive coding, every carefully crafted comment, etc all added to the maintenance burden; I'd even go as far as claim it increased uncertainty, because what if I forgot something?

15 years later and it feels like I'm doing far less advanced stuff (although in hindsight what I did then wasn't all that, but I made it advanced). One issue came up recently; a generic button component would render really tall if no label was given, which happened when a CMS editor did not fill in a label in an attempt to hide it. The knee-jerk response would be to add a check that disallows empty labels, or to not render the button if no label is given, or to use a default button label.

But now I think I'll look at the rendering bug and just... leave the rest. A button with an empty label isn't catastrophic. Writing rules for every possible edge case (empty label, whitespaces, UTF-8 characters escaping the bounds, too long text, too short text, non-text, the list goes on) just adds maintenance and complexity. And it's just a button.

benrutter9mo ago

It's not that I don't think that's important, but I think with failure you always have an issue around needing N+1 checks (please don't take this as an argument against checks though).

The Therac-25 was meant to have a detector of radiation levels to cut things off if a safe value was exceeded, but it didn't work. It could obviously have been improved, but you always have the possibility that "what if our check doesn't work?".

In the case of the Therac-25, if the first initial failures had been reported and investigated, my understanding is (I should make clear I'm not an expert here) it would have made the issues apparent, and it could have been recalled before any of the fatal incidents happened.

In a swiss cheese model of risk, you always want as many layers as possible, so your point about a detector fits in there, but the final layer should always be if an incident does happen, and something gets past all our checks, how can we make it likely that it gets investigated fully by the right person.

jonahx9mo ago

That makes sense. But wouldn't the "write quality software" and "test the software thoroughly" still be relevant to the individual pieces? If the chance of a catastrophic failure is the product of the failure rates of the pieces, getting P(PartFail) low helps too -- even if having multiple backups is the main source of protection.

fulafel9mo ago

The GP didn't propose processes in the sw engineering part though but "the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed"

graypegg10mo ago

I think the range of radiation dose might vary too much to make a radiation source a totally isolated system, but trying to keep it as a simple physical lockout, I could imagine part of the start up process involving inserting a small module containing a fuse that breaks at a certain current that could be swapped out for different therapies or something. Could even add a simple spring+electromagnet mechanism that kicks that module out when power gets cut so radiotechs have to at least acknowledge the fuse before start up each time.

I will say that me pretending to know how to best design medical equipment as a web developer is pretty full of myself haha. Highly doubt whatever I'm spouting is a new idea. The idea of working on this sort of high-reliability + high-recoverability systems seems really interesting though!

vorgol10mo ago

I was going to recommend that exact podcast episode but you beat me to it. Totally worth listening, especially if you're interested in software bugs.

Another interesting fact mentioned in the podcast is that the earlier (manually operated) version of the machine did have the same fault. But it also had a failsafe fuse that blew so the fault never materialized. Excellent demonstration of the Swiss Cheese Model: https://en.wikipedia.org/wiki/Swiss_cheese_model

bell-cot10mo ago

>> the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.

> the earlier (manually operated) version of the machine did have the same fault. But it also had a failsafe fuse that blew so the fault never materialized.

#1 virtue of electromechanical failsafes is that their conception, design, implementation, and failure modes tend to be orthogonal to those of the software. One of the biggest shortcomings of Swiss Cheese safety thinking is that you too-often end up using "neighbor slices from the same wheel of cheese".

#2 virtue of electromechanical failsafes is that running into them (the fuse blew, or whatever) is usually more difficult for humans to ignore. Or at least it's easier to create processes and do training that actually gets the errors reported up the chain. (Compared to software - where the worker bees all know you gotta "ignore, click 'OK', retry, reboot" all the time, if you actually want to get anything done):

But, sadly, electromechanical failsafes are far more expensive then "we'll just add some code to check that" optimism. And PHB's all know that picking up nickles in front of the steamroller is how you get to the C-suite.

snerbles10mo ago

When I worked at an industrial integrator, we had a hard requirement for hard-wired e-stop circuits run by safety relays separate from the PLC. Sometimes we had to deal with dangerous OEM equipment that had software interlocks, and the solution was usually just to power the entire offending device down when someone hit an e-stop or opened a guarding panel.

About a decade ago a rep from Videojet straight up lied to us about their 30W CO2 marking laser having a hardware interlock. We found out when - in true Therac-25 fashion - the laser kept triggering despite the external e-stop being active due to a bug in their HMI touch panel. No one noticed until it eventually burned through the lens cap. In reality the interlock was a separate kit, and they left it out to reduce the cost for their bid to the customer. That whole incident really soured my opinion of them and reminded me of just how bad software "safety" can get.

2 more replies

WalterBright10mo ago

> And PHB's all know that picking up nickles in front of the steamroller is how you get to the C-suite.

Blaming it on PHB's is a mistake. There were no engineering classes in my degree program about failsafe design. I've known too many engineers who were insulted by my insinuations that their design had unacceptable failure modes. They thought they could write software that couldn't possibly fail. They'd also tell me that they could safely recover and continue executing a crashed program.

This is why I never, ever trust software switches to disable a microphone, software switches that disable disk writes, etc. The world is full of software bugs that enable overriding of their soft protections.

BTW, this is why airliners, despite their advanced computerized cockpit, still have an old fashioned turn-and-bank indicator that is independent of all that software.

1 more reply

ipython10mo ago

Don’t worry we are poised to re learn all these lessons once again with our fancy new agentic generative ai systems.

The mechanical interlock essentially functioned as a limit outside of the control system. So you should build an ai system the same way- enforcing restrictions on the security agency from outside the control of the ai itself. Of course that doesn’t happen and devs naively trust that the ai can make its own security decisions.

Another lesson from that era we are re learning- in-band signaling. Our 2025 version of the “blue box” is in full swing. Prompt injection is just a side effect of the fact that there is no out of band instruction mechanism for llms.

Good news is - it’s not hard to learn the new technology when it’s just a matter of rediscovering the same security issues with a new name!

ChrisMarshallNY10mo ago

I worked for a company that manufactured some of the highest-Quality photographic and scientific equipment that you can buy. It was expensive as hell, but our customers seemed to think it was worth it.

> It's the end result of a process

In my experience, it's even more than that. It's a culture.

f1shy10mo ago

It is a culture. No doubt. And certainly not only processes. I work in a company where there are literally processes for everything. For every engineer doing actual work (requirement engineering, architecture, coding, testing) there are at least 3!doing processes. The SW we make is the ultimative piece of shit, late, expensive, and to the brim full of bugs. So process is importannt, but good engineering, and culture also.

franktankbank10mo ago

A culture of high-quality engineering, no doubt. Made up of: high quality engineers!

kulahan10mo ago

Unfortunately, software developers are the absolute most offensive use of the word "engineer", because 99.9% of the stuff this field makes is a competition to take the most unique approach to a solution, then getting it bandaged together with gum and paperclips.

If this industry wants to be respected, it should start trying to be actual engineers. There should be tons and tons of standards which are enforced legally, but this is not often the case. Imagine if there were no real legal guardrails in, say, bridge building!

edit: and imagine if any time you brought up this issue, bridge builders cockily responded with "well stuff seems to work fine so..."

5 more replies

ChrisMarshallNY10mo ago

Yes, but some of them were the most stubborn bastards I've ever worked with.

2 more replies

herval10mo ago

you don't need "high quality engineers" to have high-quality outputs. And vice-versa - lots of places with very high quality engineers produce terribly low-quality software

3 more replies

anonymars10mo ago

Isn't that exactly the opposite of the point being made?

> software quality doesn't appear because you have good developers

1 more reply

pjmlp10mo ago

The worst part is that many devlopers think that by not working with high integrity systems, such quality levels don't apply to them.

Wrong, any software failure can have huge consequences in someone's life, or company, by preventing some critical flow to take place, corrupting data related to someone's life, professional or medical record, preventing a payment on some specific goods that had to be acquired on that moment or never,....

ozim10mo ago

Hey don’t blame developers.

It is business who requests features ASAP to cut costs and and then there are customers who don’t want to pay for „ideal software” but rather have every software for free.

Most devs and QA workers I know want to deliver best quality software and usually are gold plating stuff anyway.

pjmlp10mo ago

Being a real Software Engineer, those that actually have the proper title, eventually with the final examination, means being able to deliver the best product within the set of given constraints.

Also, speaking out when the train is visibly going against a wall.

2 more replies

Cthulhu_9mo ago

Business can request it, but it's your job as a software engineer to build quality software; don't shift the blame.

Does a construction engineer blame an architect's wacky designs if a building collapses? No, they either engineer it so it doesn't collapse, convince the architect that it will collapse because physics, or they refuse.

People want to be able to use a bridge for free too, doesn't mean there's no money in it.

As for gold plating, is that really improving software quality, or is that yak shaving / bike shedding?

chairmansteve10mo ago

I would say the real lesson is that the Therac machine should have had hardware interlocks (mentioned but not emphasised in the article).

sonicggg10mo ago

Not sure why the article is focusing so much on software development. That was just a piece of the problem. The entire product had design flaws. When the FDA for involved, the company wasn't just told to make software updates.

speed_spread10mo ago

Yet It doesn't take much to swamp a team of good developers. A poorly defined project, mismatched requirements, sent to production too early and then put in support mode with no time planned to plug the holes... There's only so much smart technicians can do when the organization is broken.

AdamN10mo ago

This is true but there also needs to be good developers as well. It can't just be great process and low quality developer practices. There needs to be: 1/ high quality individual processes (development being one of them), 2/ high quality delivery mechanisms, 3/ feedback loops to improve that quality, 4/ out of band mechanisms to inspect and improve the quality.

Fr3dd110mo ago

I would argue that a good process always has a good self correction mechanism built in. This way, the work done by a "low quality" software developer (this includes almost all of us at some point in time), is always taken into account by the process.

quietbritishjim10mo ago

Right, but if everyone is low quality then there's no one to do that correction.

That may seem a bit hypothetical but it can easily happen if you have a company that systematically underpays, which I'm sure many of us don't need to think hard to imagine, in which case they will systematically hire poor developers (because those are the only ones that ever applied).

3 more replies

rcxdude10mo ago

This only works with enough good developers involved in the process. I've seen how the sausage is made, and code quality is often shockingly low in these applications, just in ways that don't set off the metrics (or they do, but they can bend the process to wave them away). Also, the process often makes it very hard to fix latent problems in the software, so it rarely gets better over time, either.

varjag10mo ago

My takeaway from observing different teams over years is the talent by a huge margin is the most important component. Throw a team of A performers together and it really doesn't matter what process you make them jump through. This is how a waterfall team got the mankind to the Moon with handwoven core memory but an agile team 10x the size can't fix the software for a family car.

1 more reply

vjvjvjvjghv10mo ago

“ This way, the work done by a "low quality" software developer (this includes almost all of us at some point in time), is always taken into account by the process”

That’s a horrible take. There is no amount of reviews, guidelines and documentation that can compensate for low quality devs. You can’t throw garbage into the pipeline and then somehow process it to gold.

franktankbank10mo ago

The process that makes this work would be so onerous to create. Would you think you could do this to make a low quality machinist be able to build a high quality technical part? What would this look like? Quite a lot like machine code which doesn't really reduce the requirements does it? It actually just shifted the onerous requirement somewhere else.

Cthulhu_9mo ago

> (this includes almost all of us at some point in time)

I'd say this includes all of us all the time; a good developer never trusts their own work blindly, and spends more time gathering requirements and verifying their and others' work than writing code.

rowanG07710mo ago

I think the opposite. The only reason software quality emerges is because of good developers. It's a prerequisites. Process helps good developers deliver quality. But there is no process that allows a team of bad developers deliver quality. you can't squeeze blood from a stone.

benrutter9mo ago

That's true, but even so - great developers still make mistakes, and if they don't hear about production errors because of a breakdown in the customer communications from sales etc, then those mistakes will never be fixed.

It's not that great developers aren't necessary for software quality, more that they aren't sufficient.

1 more reply

credit_guy10mo ago

I'm not sure. Most software (by orders of magnitude) is not critical software like the software running that X-ray machine. In general, if your software fails, a page loads too slow, or a report comes with lots of NaN's, or some batch job does not run at the right time, and someone needs to start it manually, etc. The cases where someone dies because of a software quality issue are very rare, and the developers working on that type of software know who they are and what their duties are (I hope).

0xDEAFBEAD10mo ago

Honestly I wish instead of the Therac-25, we were discussing a system which made use of unit testing and defensive coding, yet still failed. That would be more educational. It's too easy to look at the Therac-25 and think "I would never write a mess like that".

roeles10mo ago

One instance that crosses my mind often is the airbus a320 incident at Hamburg in 2008. Everything was done right there, but the requirements were wrong.

Despite all the procedures and tests, the software still managed to endanger the lives of the passengers.

Izkata9mo ago

The Boeing 737 MAX had an additional safety feature that was causing crashes due to bad input from the sensors, that pilots didn't know about so they couldn't override. This was 2018 and 2019. After the first crash, the manuals and training were updated to explain what was going on and how to override it.

https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Au...

0xDEAFBEAD10mo ago

Interesting, do you happen to have a case study?

2 more replies

I_dream_of_Geni10mo ago

Speaking of Airbus, They 'lost' 3-4 different aircraft (from 1988 to 2015) which crashed during development, or, spectacularly during their first airshow. Never slowed down their customers at ALL, and to this day, Boeing has never lost one new commercial airliner in those same circumstances. Yet, Boeing gets all the hate. smh

jopsen10mo ago

I'd agree, it's super easy to think such errors won't happen had they just used a fairly safe language and sane architecture. Or unit test, race detectors, etc.

I suspect that few organizations that do all that, have a process/culture of ignoring bugs in the wild -- and those that do have such complicated domains that explaining the error is hard.

Software best practices today would probably also involve sending metrics, logs, error reports, etc.

That said, it's still extremely easy get embrace a culture were unexplainable errors are ignored. Especially in a cloud environment.

hinkley10mo ago

I bring up Knight Capital every time people start acting like feature toggles will solve every problem we have with feature rollout.

KC lost over $400 million in less than an hour due to an old feature toggle and a problem with their deployment process.

jldugger10mo ago

Perhaps this is why the cover of my software correctness book in undergrad used a series of stills from the arianne-5 disaster[1] for the cover.

[1]: https://en.wikipedia.org/wiki/Ariane_5#Notable_launches

wat1000010mo ago

The lesson is not to write a mess like that. It might seem obvious, but it has to be learned.

kccqzy10mo ago

Software engineering has advanced in the past few decades that the kind of code considered a "mess" has expanded.

1 more reply

msy10mo ago

I couldn't disagree more. Outside of exotic scenarios where things like formal proofing are possible and economically viable I've never seen a process that prevents bugs, only culture. Good engineering cultures are also often ones with well defined and tested processes and good testing practices but it's the culture and people giving a shit that makes the difference, not the other way around.

Good product cultures are ones where natural communication between the field and engineering would mean issues get reported back up and make their way to the right people. No process will compensate for people not giving a shit.

benrutter9mo ago

> It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.

I think the bit I quoted, especially if you read in the context of the article, is talking about culture. I.e. it's talking about a process that informs software development, management and sales. Things like formal proofing and type systems are the exact kind of processes that aren't what it's talking about.

I kind of agree with you though about the process/culture distinction - ultimately, if you don't have a culture where people actively care about improving reliability, any process is just gonna become a tick-box exercise to appease management.

isopede10mo ago· 18 in thread

I strongly believe that we will see an incident akin to Therac-25 in the near future. With as many people running YOLO mode on their agents as there are, Claude or Gemini is going to be hooked up to some real hardware that will end up killing someone.

Personally, I've found even the latest batch of agents fairly poor at embedded systems, and I shudder at the thought of giving them the keys to the kingdom to say... a radiation machine.

SCdF10mo ago

The Horizon (UK Royal Mail accounting software) incident killed multiple postmasters through suicide, and bankrupted and destroyed the lives of dozens or hundreds more.

The core takeaway developers should have from Therac-25 is not that this happens just on "really important" software, but that all software is important, and all software can kill, and you need to always care.

hahn-kev10mo ago

From what I've read about that incident I don't know what the devs could have done. The company sure was a problem but also the laws basically saying a computer can't be wrong. No dev can solve that problem.

V__10mo ago

> Engineers are legally obligated to report unsafe conduct, activities or behaviours of others that could pose a risk to the public or the environment. [1]

If software "engineers" want to be taken seriously, then they should also have the obligation to report unsafe/broken software and refuse to ship unsafe/broken software. The developers are just as much to blame as the post office:

> Fujitsu was aware that Horizon contained software bugs as early as 1999 [2]

[1] https://engineerscanada.ca/news-and-events/news/the-duty-to-...

[2] https://en.wikipedia.org/wiki/British_Post_Office_scandal

2 more replies

siva710mo ago

Then you haven't read deep enough into the Horizon UK case. The lead devs have to take a major blame for what happened as they lied to the investigators and could have helped prevent early on some suicides if they had courage. These devs are the worst kind of, namely Gareth Jenkins and Anne Chambers.

sim7c0010mo ago

as you point out this was a messup on a lot of levels. its an interesting effect tho not to be dismissed. how your software works and how its perceived and trusted can impact people psychologically.

codeulike10mo ago

It was a distributed system lashed together by 'consultants' (read: recent graduates with little real world software engineering experience) in an era where best practices around distributed systems were non-existent. They weren't even thinking about what kind of data inconsistencies they might end up with.

SCdF10mo ago

The code being absolute dog shit was true regardless of that law's existence. There are plenty of things the developers could have done.

That law is irrelevant to this situation, except in that the lawyers for Fujitsu / Royal Mail used it to imply their code was infallable.

fuckaj10mo ago

Given whole truth testimony?

1 more reply

maweki10mo ago

But there is still a difference here. Provenance and proper traceability would have allowed the subpostmasters to show their innocence and prove the system failable.

In the Therac-25 case, the killing was quite immediate and it would have happened even if the correct radiation dose was recorded.

scott_w10mo ago

I’m not sure it would. Remember that the prosecutors in this case were outright lying to the courts about the system! When you hit that point, it’s really hard to even get a clean audit trail out in the open any more!

SCdF10mo ago

I don't understand the distinction here.

> Provenance and proper traceability would have allowed

But there wasn't those things, so they couldn't, so they were driven to suicide.

Bad software killed people. It being slow or fast doesn't seem to matter.

1 more reply

grues-dinner10mo ago

Non-agentic AI is already "killing" people by some definitions. There's a post about someone being talked into suicide on the front page right now, and they are 100% going to get used for something like health insurance and benefits where avoidable death is a very possible outcome. Self-driving cars are also full of "AI" and definitely have killed people already.

Which is not to say that software hasn't killed people before (Horizon, Boeing, probably loads of industrial accidents and indirect process control failures leading to dangerous products, etc, etc). Hell, there's a suspicion that austerity is at least partly predicated on a buggy Excel spreadsheet, and with about 200k excess deaths in a decade (a decade not including Covid) in one country, even a small fraction of those being laid at the door of software is a lot of Theracs.

AI will probably often skate away from responsibility in the same way that Horizon does: by being far enough removed and with enough murky causality that they can say "well, sure, it was a bug, but them killing themselves isn't our fault"

I also find AI copilot things do not work well with embedded software. Again, people YOLOing embedded isn't new, but it might be about to get worse.

the-grump10mo ago

The 737 MAX MCAS debacle was one such failure, albeit involving a wider system failure and not purely software.

Agreed on the future but I think we were headed there regardless.

jonplackett10mo ago

Yeah reading this reminded me a lot of MCAS. Though MCAS was intentionally implemented and intentionally kept secret.

throwawayoldie10mo ago

They killed "only" about 350 people combined, but the two fatal crashes of the Boeing 737 MAX in 2018 and 2019 were due to poor quality software:

https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Au...

Maxion10mo ago

> Personally, I've found even the latest batch of agents fairly poor at embedded systems

I mean even simple crud web apps where the data models are more complex, and where the same data has multiple structures, the LLMs get confused after the second data transformation (at the most).

E.g. You take in data with field created_at, store it as created_on, and send it out to another system as last_modified.

sim7c0010mo ago

talk to anyone in the industries about 'automation' on medical or critical infra devices and they will tell you NO. No touching our devices with your rubbish.

i am pretty confident they wont let claude touch if it they dont even let deterministic automations run...

that being said, maybe there are places. but this is always the sentiment i got. no automating, no scanning, no patching. device is delivered certified and any modifications will invalidate that. any changes need to be validated and certified.

its a different world that makin apps thats for sure.

not to say mistakes arent made and change doesnt happen, but i dont think people designing medical devices will be going yolo mode on their dev cycle anytime soon... give the folks in safety critical system engineering some credit..

throwaway026110mo ago

> but i dont think people designing medical devices will be going yolo mode on their dev cycle anytime soon

I don't have the same faith in corporate leadership as you, at least not when they see potentially huge savings by firing some of the expensive developers and using AI to write more of the code.

napolux10mo ago· 18 in thread

The most deadly bug in history. If you know any other deadly bug, please share! I love these stories!

kgwgk10mo ago

Several people killed themselves over this: https://www.wikipedia.org/wiki/British_Post_Office_scandal

https://www.theguardian.com/uk-news/2024/jan/09/how-the-post...

One member of the development team, David McDonnell, who had worked on the Epos system side of the project, told the inquiry that “of eight [people] in the development team, two were very good, another two were mediocre but we could work with them, and then there were probably three or four who just weren’t up to it and weren’t capable of producing professional code”.

What sort of bugs resulted?

As early as 2001, McDonnell’s team had found “hundreds” of bugs. A full list has never been produced, but successive vindications of post office operators have revealed the sort of problems that arose. One, named the “Dalmellington Bug”, after the village in Scotland where a post office operator first fell prey to it, would see the screen freeze as the user was attempting to confirm receipt of cash. Each time the user pressed “enter” on the frozen screen, it would silently update the record. In Dalmellington, that bug created a £24,000 discrepancy, which the Post Office tried to hold the post office operator responsible for.

Another bug, called the Callendar Square bug – again named after the first branch found to have been affected by it – created duplicate transactions due to an error in the database underpinning the system: despite being clear duplicates, the post office operator was again held responsible for the errors.

BoxOfRain10mo ago

More heads should have rolled over this in my opinion, absolutely despicable that they cheerfully threw innocent people in prison rather than admit their software was a heap of crap. It makes me so angry this injustice was allowed to prevail for so long because nobody cared about the people being mistreated and tarred as thieves as long as they were 'little people' of no consequence, while senior management gleefully covered themselves in criminality to cover for their own uselessness.

It's an archetypal example of 'one law for the connected, another law for the proles'.

benrutter10mo ago

Probably many rather than a single bug, but the botched London Ambulance dispatch software from the 90s, is probably one of the most deadly software issues of all time, although there aren't any estimates I know of that try to quantify the number of lives lost as a result.

http://www0.cs.ucl.ac.uk/staff/a.finkelstein/papers/lascase....

napolux9mo ago

thanks for sharing!

A1kmm10mo ago

Not even close. Israel apparently has AI bombing target intel & selection systems called Gospel and Lavender - https://www.theguardian.com/world/2024/apr/03/israel-gaza-ai.... Claims are these systems have a selectivity of 90% per bombing, and they were willing to bomb up to 20 civilians per person classified by the system as a Hamas member. So assuming that is true, 90% of the time, they kill one Hamas member, and up to 20 innocents. 10% of the time, they kill up to 21 innocents and no Hamas members.

Killing 20 innocents and one Hamas member is not a bug - it is callous, but that's a policy decision and the software working as intended. But when it is a false positive (10% of the time), due to inadequate / outdated data and inadequate models, that could reasonably classified as a bug - so all 21 deaths for each of those bombings would count as deaths caused by a bug. Apparently (at least earlier versions) of Gospel were trained on positive examples that mean someone is a member of Hamas, but not on negative examples; other problems could be due to, for example, insufficient data, and interpolation outside the valid range (e.g. using pre-war data about, e.g. how quickly cell phones are traded, or people movements, when behaviour is different post-war).

I'd therefore estimate that deaths due to classification errors from those systems is likely in the thousands (out of the 60k+ Palestinian deaths in the conflict). Therac-25's bugs caused 6 deaths for comparison.

NitpickLawyer10mo ago

The MCAS related bugs @ Boeing led to 300+ deaths, so it's probably a contender.

solids10mo ago

Was that a bug or a failure to inform pilots about a new system?

thyristan10mo ago

In the same vein one could argue that Therac-25 was not actually a software bug but a hardware problem. Interlocks, that could have prevented the accidents and that where present in earlier Therac models, were missing. The software was written with those interlocks in mind. Greedy management/hardware engineers skipped them for the -25 version.

It's almost never just software. It's almost never just one cause.

1 more reply

AdamN10mo ago

Both - and really MCAS was fine but the issue was the metering systems (Pitot tubes) and the handling of conflicting data. That part of the puzzle was definitely a bug in the logic/software.

3 more replies

NitpickLawyer10mo ago

I would say plenty of both. They obviously had to inform the pilots, but the way the system didn't reset permanently after 2-3 (whatever) sessions of "oh, the pilot trimmed manually, after 10 seconds we keep doing the same thing" was a major major logic blunder. Failure all across the board, if only from the perspective of end-to-end / integration testing if nothing else.

Worryingly, e2e / full integration testing was also the main cause of other Boeing blunders, like the Starliner capsule.

fuckaj10mo ago

Not a bug. A non airworthy plane they tried to patch up with software.

1 more reply

bobmcnamara10mo ago

In Dhahran, Saudi Arabia, on February 25, 1991, a Patriot missile failed to intercept an Iraqi Scud causing the death of 28 American soldiers.

The patriot missile system used floating point for time, so as uptime extended the clock became more and more granular, eventually to the point where time skipped so far that the range gate was tripped.

The fix was being deployed earlier that year but this unit hadn't been updated yet.

https://www.cs.unc.edu/~smp/COMP205/LECTURES/ERROR/lec23/nod...

danadam10mo ago

Some Google Pixel phones couldn't dial emergency number (still can't?). I don't know if there were any deadly consequences of that.

https://www.androidauthority.com/psa-google-pixel-911-emerge...

echelon10mo ago

The 737 Max MCAS is arguably a bug. That killed 346 people.

Not a "bug" per se, but texting while driving kills ~400 people per year in the US. It's a bug at some level of granularity.

To be tongue in cheek a bit, buggy JIRA latency has probably wasted 10,000 human years. Those are many whole human lives if you count them up.

b_e_n_t_o_n10mo ago

> To be tongue in cheek a bit, buggy JIRA latency has probably wasted 10,000 human years. Those are many whole human lives if you count them up.

These kind of calculations always make me wonder...say someone wasted one minute of everybody's life, is the cost ~250 lives? One minute? Somewhere in between?

throwaway026110mo ago

There was a news story from Norway last year where a car allegedly accelerated by itself, causing the car to fall off the second floor of a parking garage and kill the driver.

mnw21cam10mo ago

There are plenty of "car allegedly accelerated by itself" incidents, and usually the root cause is the driver mistakenly pressing the accelerator pedal when they think they're pressing the brake pedal. And then swearing blind afterwards that they were braking as hard as they possibly could but the car kept surging forwards.

bobmcnamara10mo ago

Time and time again the introduction of electronic throttle control has spiked the number of reported unintended acceleration incidents.

There's a chart here that shows it clearly for Toyota's rollout:

https://www.embedded.com/unintended-acceleration-and-other-e...

1 more reply

elric10mo ago· 14 in thread

One of the commenters on the article wrote this:

> Throughout the 80s and 90s there was just a feeling in medicine that computers were dangerous <snip> This is why, when I was a resident in 2002-2006 we still were writing all of our orders and notes on paper.

I was briefly part of an experiment with electronic patient records in an ICU in the early 2000s. My job was to basically babysit the server processing the records in the ICU.

The entire staff hated the system. They hated having to switch to computers (this was many years pre-ipad and similarly sleek tablets) to check and update records. They were very much used to writing medications (what, when, which dose, etc) onto bedside charts, which were very easy to consult and very easy to update. Any kind of dataloss in those records could have fatal consequences. Any delay in getting to the information could be bad.

This was *not* just a case of doctors having unfounded "feelings" that computers were dangerous. Computers were very much more dangerous than pen and paper.

I haven't been involved in that industry since then, and I imagine things have gotten better since, but still worth keeping in mind.

jacquesm10mo ago

Now we have Chipsoft, arguably one of the worst players in the entire IT space that has a near monopoly (around me, anyway) on IT for hospitals. They charge a fortune, produce crap software and the larger they get the less choice there is for the remainder. It is baffling to me that we should be enabling such hostile players.

misja11110mo ago

I worked for them in the early 2000's. There was nothing wrong with the people working there, except for the two founders, a father and son. They were absolutely ruthless. And as so often, that ruthless mentality was what enabled them to gain dominance over the market. I could tell some crazy stories about how they ran the company but better not because it might get me sued. But if you understand Dutch, you can read more about them e.g. here: https://www.quotenet.nl/zakelijk/a41239366/chipsoft-gerrit-h...

skinwill10mo ago

Around here we have Epic. If you want a good scare, look up their corporate Willy Wonka-esq jail/campus and their policy of zero remote work.

Liquix10mo ago

I thought "Willy Wonka-esque" was tongue in cheek, but they have a yellow brick road leading up to the front door... absolutely bizarre

Those who want to escape the office altogether, can hop on one of the company’s 600 cow-print bikes to take meetings from a treehouse, slide down a rabbit hole or grab lunch in a train car.

https://www.cnbc.com/2024/09/01/inside-epic-systems-mythical...

2 more replies

simulator5g10mo ago

I've been to that campus. Never worked there but I got a tour when I was like 18 and it seemed like a cool place to work at the time. They clearly took a page from Google's handbook. Its like a giant adult daycare center.

bongodongobob10mo ago

Their hiring is insane too. Years ago I applied for a Sys admin I position. After 3 rounds of interviews, they gave me a test to take. Except it was something you'd give a senior dev. They specced out a fake language, super esoteric too, it was like a combination of brainfuck and assembly, and asked me to solve problems with it, and debug code written in it. I could have toughed my way through it, but I was so dissatisfied with how much of my time they'd already wasted that I decided I didn't want to work there.

3 more replies

greazy10mo ago

It's still an issue. I've heard stories of EMR system going down forcing staff to use pen and paper. It boggles my mind that such systems don't have redundancy.

These are commercial products being deployed.

elric10mo ago

I have a few pet theories of why software in the medical space is so often shitty and insanely expensive. One of them is that working with doctors is often very unpleasant, which makes building software them unpleasant, which drives up the price. I mean some of the ones I worked with were terribly nice, especially the ICU docs and neurologists, but a large majority of them were major aholes.

The other theory is there are soo many bureaucratic hoops to jump through in order to make anything in the medical space, that no one does it willingly.

siva710mo ago

It's not only the doctors, i have the gut feeling from my previous stint that people who like to work in the medical space are more often than not "difficult".

NortySpock10mo ago

The redundancy is pen and paper. The EMR just helps teams coordinate faster, pull up records faster, etc.

When I worked at Cerner years ago (now owned by Oracle), there were rumors that the Cerner EMR still could barely handle DST* spring forward, but could not handle DST fall back (where the 01:00 hour is repeated) -- you had do preemptively switch to pen-and-paper for the hours around the switch. I assume this was because someone back in the initial database design used local time instead of UTC for some critical patient-care timestamp fields in the database, and then had a bear of a time getting reliable times out of the database during the witching hour.

* Daylight Saving Time in the USA. And yes, everyone in the USA changes non-networked clocks twice a year because of some "brilliant idea" someone shoved through Congress in 1974.

EDIT: I wonder if Cerner finally fixed it?

chuckadams10mo ago

The idea of Daylight Savings Time goes back to Ben Franklin, and a lot of Europe does it too. I think it’s an obsolete notion nowadays, but it’s not exclusively a Dumb Yank Idea.

greazy9mo ago

Pen and paper is back up in case of extreme events, and not because the software crashed or a bad patch was applied.

Millions on software with no software redundancy built in is insane in the medical field.

superjan10mo ago

It”s worthwhile to mention that in the US and EU EMRs are generally not considered Medical Devices and are therefore not subject to a lot of regulations.

https://www.medicaleconomics.com/view/what-if-emrs-were-clas...

elric9mo ago

That is disturbing, but it does explain a lot.

1 more reply

michaelt10mo ago· 12 in thread

I'd be interested in knowing how many of y'all are being taught about this sort of thing in college ethics/safety/reliability classes.

I was taught about this in engineering school, as part of a general engineering course also covering things like bathtub reliability curves and how to calculate the number of redundant cooling pumps a nuclear power plant needs. But it's a long time since I was in college.

Is this sort of thing still taught to engineers and developers in college these days?

FuriouslyAdrift10mo ago

A big thing that was emphasized in my computer engineering courses at Purdue in the early 90s with regards to machine interfaces was hysteresis. A machine has a RANGE of behaviors throughout it's operating area that might not be accounted fro in your programing and you must take that into consideration (i.e. a robotic arm or electric motor doesn't just 'stop' instantly).

Analog systems do not behave like computers.

ramses010mo ago

The "IBM Black Team Debugs a Tape Drive" story comes to mind: https://www.penzba.co.uk/GreybeardStories/TheBlackTeam.html

wocram10mo ago

This was part of our Systems Engineering class, something like this: https://web.mit.edu/6.033/2014/wwwdocs/assignments/therac25....

smarks10mo ago

Thanks for posting this. The Leveson article is IMO the definitive study on this incident. I’m going to repost at top level for emphasis.

mrguyorama10mo ago

The therac-25 was just one of the many incidents we covered in my Software Ethics course for my Computer Science degree. The problem is not "we have to teach it", the problem is that at least half the talented people in the room with me in that class considered the entire thing "a joke" bullshit class that just wasted their time.

You can't teach people to care.

BoxOfRain10mo ago

I was taught about it in university as a computer science undergrad, thought about it often since I ended up working in medtech.

aDyslecticCrow10mo ago

Im too curious, I made a poll. I for sure wasnt in computer science uni. I only heard about it vaguely online.

https://strawpoll.com/NMnQNX9aAg6

mlnhd10mo ago

This and Tacoma Narrows are literally the only topics covered in engineering ethics, which itself is literally only a one hour presentation.

InvisibleUp10mo ago

Don’t forget the Hyatt Regency walkway, too.

lgeek10mo ago

It was taught in a first year software ethics class on my Computer Science programme. Back in 2010. I'm wondering if they still do

firesteelrain10mo ago

I was taught Computer Ethics back in the early 2000s as part of my CS degree.

3D3049742010mo ago

I studied design and I wish we'd had a design ethics class, which would have covered instances like this.

mellosouls10mo ago· 9 in thread

TIL TheDailyWTF is still active. I'd thought it had settled to greatest hits only some years ago.

greatgib10mo ago

This story is kind of old. But also I'm suspicious that this was an AI generated content due to this weird paragraph (one becoming "they"):

   It's worth noting that there was one developer who wrote all of this code. They left AECL in 1986, and thankfully for them, no one has ever revealed their identity. And while it may be tempting to lay the blame at their feet—they made every technical choice, they coded every bug—it would be wildly unfair to do that.

pie_flavor10mo ago

'They' is a correct singular form for a person of unknown gender. Modern writing overwhelmingly uses it instead of 'he or she', but it has always been correct, has been predominant for a long time, and furthermore it doesn't have anything to do with AI, nor was AI viable as an authoring tool when this article was written, nor is Remy ever going to sell out. What a bizarre comment.

semv3r10mo ago

Singular "they" has been used since at least the 14th century—was generative AI commonly available then? https://en.wikipedia.org/wiki/Singular_they

remyporter10mo ago

I’ve been writing on the Internet since very early days, and have put almost twenty years into The Daily Wtf specifically. Which means I’m actually over represented in the training set. I don’t write like AI. AI writes like me.

lopespm9mo ago

I am really amazed by the frequency and the quality of your output.

Would you have an article on your routines, how you structure your day / work? Essentially, what enables your consistency, and quality articles?

edot10mo ago

Isn’t that the pronoun to use when you’re unsure of gender? This article didn’t feel AI-y to me.

bigstrat200310mo ago

The correct neutral pronoun in English is "he", though using "they" has been very popular in recent years due to a mistaken belief that it's sexist to use the male pronoun that way. I wouldn't say it's an AI smell at all.

2 more replies

tbossanova10mo ago

That is 100% standard english, dude. I feel like I might have read that exact sentence 20 years ago...

HankStallone10mo ago

It writes that way because almost everyone writes that way these days. It's annoying if you learned English grammar from textbooks and other materials written over 50 years ago, but it's extremely common now anyway. So a large chunk of its training data will be that way.

It's interesting, because all the older works in its training data will default to the masculine singular, and that has to be a massive number of books too. But maybe the modern writing, including lots of online sources, simply overwhelms that. Or it's one of the guardrails written into the AIs to avoid offending people.

OskarS10mo ago· 7 in thread

It's interesting to compare this with the Post Office Scandal in the UK. Very different incidents, but reading this, there is arguably a root assumption in both cases that people made, which is that "the software can't be wrong". For developers, this is a hilariously silly thing, but for non-developers looking at it from the outside, they don't have the capability or training to understand that software can be this fragile. And they look at a situation like the post office scandal and think "Either this piece of software we paid millions for and was developed by a bunch of highly trained engineers is wrong, or these people are just ripping us off". Same thing with Therac-25, this software had worked on previous models and the rest of the company just had this unspoken assumption that it simply wasn't possible that there was anything wrong with it, so testing it specifically wasn't needed.

jwr10mo ago

No, this is not a "hilariously silly thing" for developers. In fact, I'd say that most developers place way too much trust in software.

I am a developer and whatever software system I touch breaks horribly. When my family wants to use an ATM, they tell me to stand at a distance, so that my aura doesn't break things. This is why I will not get into a self-driving car in the foreseeable future — I think we place far too much confidence in these complex software systems. And yet I see that the overwhelming majority of HN readers are not only happy to be beta-testers for this software as participants in road traffic, but also are happy to get in those cars. They are OK with trusting their life to new, complex, poorly understood and poorly tested software systems, in spite of every other software system breaking and falling apart around them.

[anticipating immediate common responses: 1) yes, I know that self-driving car companies claim that their cars are statistically safer than human drivers, this is beyond the point here. One, they are "safer" largely because they drive so badly that other road participants pay extra attention and accommodate their weirdness, and two, they are still new, complex and poorly understood systems. 2) "you already trust your life to software systems" — again, beyond the point, not quite true as many software systems are built to have human supervision and override capability (think airplanes), and others are built to strict engineering requirements (think brakes in cars) while self-driving cars are not built that way.]

crazygringo10mo ago

> but also are happy to get in those cars. They are OK with trusting their life to new, complex, poorly understood and poorly tested software systems

Because the alternative isn't bug-free driving -- it's a human being. Who maybe didn't sleep last night, who might have a heart attack while their foot is on the accelerator, who might pull over and try to sexually assault you.

You don't need to "place confidence in these complex software systems" -- you just need to look at their safety stats vs e.g. regular Uber. It's not a matter of trust; it's literally just a matter of statistics, and choosing the less risky option.

pfdietz10mo ago

I wonder if this is a desired outcome of fuzzing, the puncturing of the idea that software doesn't have bugs. This goes all the way back to the very start of fuzzing with Barton Miller's work from ~1990.

brazzy10mo ago

> there is arguably a root assumption in both cases that people made, which is that "the software can't be wrong"

I think in this case, the thought process was based on the experience with older, electro-mechanical machines where the most common failure modern was parts wearing out.

Since software can, indeed, not "wear out", someone made the assumption that it was therefore inherently more reliable.

balamatom10mo ago

I think the "software doesn't wear out" assumption is just a conceivable excuse for the underlying "we do not question" assumption. A piece of software can be like a beautiful poem, but the kind of software most people are familiar with is more like a whole lot of small automated bureaucracies.

Bureaucracy being (per Graeber 2006) something like the ritual where by means of a set of pre-fashioned artifacts for each other's sake we all operate at 2% of our normal mental capacities and that's how modern data-driven, conflict-averse societies organize work and distribute resources without anyone being able to have any complaints listened to.

>Bureaucracies public and private appear—for whatever historical reasons—to be organized in such a way as to guarantee that a significant proportion of actors will not be able to perform their tasks as expected. It also exemplifies what I have come to think of the defining feature of a utopian form of practice, in that, on discovering this, those maintaining the system conclude that the problem is not with the system itself but with the inadequacy of the human beings involved.

Most places where a computer system is involved in the administration of a public service or something of the caliber, has that been a grassroots effort, hey computers are cool and awesome let's see what they change? No, it's something that's been imposed in the definitive top-down manner of XX century bureaucracies. Remember the cohort of people who used to become stupid the moment a "thinking machine" was powered within line of sight (before the last uncomputed generation retired and got their excuse to act dumb for the rest of it)? Consider them in view the literally incomprehensible number of layers that any "serious" piece of software consists of; layers which we're stuck producing more of, when any software professional knows the best kind of software is less of it.

But at least it saves time and the forest, right? Ironically, getting things done in a bureaucratic context with less overhead than filling out paper forms or speaking to human beings, makes them even easier to fuck up. And then there's the useful fiction of "the software did it" that e.g. "AI agents" thing is trying to productize. How about they just give people a liability slider in the spinup form, eh, but nah.

Wanna see a miracle? A miracle is when people hype each other into pretending something impossible happened. To the extent user-operated software is involved in most big-time human activities, the daily miracle is how it seems to work well enough, for people to be able to pretend it works any good at all. Many more than 3 such cases. But of course remembering the catastrophal mistakes of the past can be turned into a quaint fun-time activity. Building things that empower people to make less mistakes, meanwhile, is a little different from building artifacts for non-stop "2% time".

ndsipa_pomu10mo ago

I'd consider the Post Office Scandal to be far more malicious. The higher ups in the post office were getting bonuses IIRC according to how much money was "recovered" (defrauded) from the subpostmasters. Also there was a lot of lying to the courts and ministers about the reliability of the software.

As far as I know, the Therac-25 incidents were reasonably honest mistakes.

OskarS10mo ago

I agree, that is very true, Therac-25 was incompetence, Post Office was incompetence with a heavy dose of malice. This aspect just steuck me as similar, the unquestioning belief in the infallibility of software.

zackmorris10mo ago· 6 in thread

Our power went off a couple off weeks ago due to wind probably knocking a branch into a power line. Now our Frigidaire microwave runs with the door open.

Supposedly there are mechanical switches that prevent that, but evidently "modern" microwaves can control the gun through the logic board.

The engineering failures that led to this, from conceptual to design to internal control, boggle my mind. I'm not even sure where to send a complaint or if it would result in any kind of compensation. Because billion dollar corporations know that they'll never have to face any kind of corporate death penalty because they're protected by limited liability. So we'll just buy another $150 microwave instead.

Are smaller companies better at engineering safety? Evidently not.

bongodongobob10mo ago

I have a microwave from the early 80s. If I stick a pencil in the door latch I can get it to run with the door open as well. It's not the demon core. Just don't stick your head in it when it's running.

simoncion10mo ago

> If I stick a pencil in the door latch I can get it to run with the door open as well.

"The safety interlocks don't work when the operator intentionally goes out of his way to defeat them." isn't a concern. There's only so much you can do to prevent someone who's dedicated to disabling them.

"The safety interlocks fail dangerous because of an unexpected power cut." is a huge concern. What else did the manufacturer skimp on, or -worse- simply fail to understand was important to do for the safety of the operator of the device?

bongodongobob9mo ago

It's crazy to me that you are ruling out a power surge frying a board. Same thing could happen to the 80s model as well. You have not root caused it and are making up a failure mode that fits your point. Hell, the hall sensor could be fried and that's pretty damn mechanical. Again, your microwave isn't a demon core. Inverse square law applies. Don't put any limbs inside when it's on, and it really isn't that dangerous, so I'm not shocked they didn't apply aircraft safety design rules.

1 more reply

dezgeg10mo ago

It's possible that it's just operating the lights and turntable without actually cooking - if you search HN you find that failure mode.

zackmorris9mo ago

Ya that's what I'm hoping, because I've been blasted for a second or two here and there. I guess I could put a CD next to it and see if it crackles haha. But I'll probably just recycle it and move on.

_kb10mo ago

Stand next to it with a chocolate bar in your pocket. I believe that’s the canonical way to check if it’s active.

ChrisMarshallNY10mo ago· 4 in thread

I worked for hardware manufacturers for most of my career, as a software guy.

In my experience, hardware people really dis software. It's hard to get them to take it seriously.

When something like this happens, they tend to double down on shading software.

I have found it very, very difficult to get hardware people to understand that software has a different ruleset and workflow, from hardware. They interpret this as "cowboy software," and think we're trying to weasel out of structure.

kccqzy10mo ago

Hardware designers benefit from having multiple separate teams to test their product. A chip designer can rely on at least two other teams to test the designed chip, and one of them will be using formal verification. If software also has long release cycles and high cost to remedy mistakes, you bet we would also have multiple testers. In fact that was what happened in the 90s with shrink wrapped software and without easy updates.

ChrisMarshallNY10mo ago

That was the case for the company that I worked at.

The official QA organization was very powerful, and had no compunctions about stopping an entire product line, for one bug.

When that happened, the department responsible for the bug would find themselves against the wall.

As a result, all the software departments had pretty big teams of testers, who would validate the software, before it was released to the purview of the QA organization.

It could be pretty restricting, but we always felt confident that what we shipped, worked.

scottLobster10mo ago

ChrisMarshallNY10mo ago

I think that was the F-35 incident (in Alaska).

The Therac-25 incident was a radiation overdose in Texas.

1 more reply

haunter10mo ago· 3 in thread

My "favorite" part:

>One failure occurred when a particular sequence of keystrokes was entered on the VT100 terminal that controlled the PDP-11 computer: If the operator were to press "X" to (erroneously) select 25 MeV photon mode, then use "cursor up" to edit the input to "E" to (correctly) select 25 MeV Electron mode, then "Enter", all within eight seconds of the first keypress and well within the capability of an experienced user of the machine, the edit would not be processed and an overdose could be administered. These edits were not noticed as it would take 8 seconds for startup, so it would go with the default setup

Kinda reminds me how everything is touchscreen nowadays from car interfaces to industry critical software

hiccuphippo10mo ago

And we have a concept, optimistic updates, for making the ui look responsive while the updates happen in the background and reconcile later. I can only hope they know when not to use it.

kevincox9mo ago

Optimistic updates should almost always be paired with some sort of indicator showing if/when a value has actually been persisted. In practice this is rarely implement. Even failures are often not shown and value rolled back. (Making it a very optimistic update indeed)

ramses010mo ago

   Try quickly typing 1+ 2 + 3 into the iOS 11 Calculator (reddit.com)
   886 points by danso on Oct 24, 2017 | hide | past | favorite | 480 comments

https://news.ycombinator.com/item?id=15538666

...this _exact_ same failure mode in a "less" critical domain (eg: literally your most frequently used "pocket calculator"), unless you're using the calculator for Important Things(tm).

Tenemo10mo ago· 3 in thread

The full 1993 report linked in the article has an intetesting statement regarding software developer certfication in the "Lessons learned" chapter:

> Taking a couple of programming courses or programming a home computer does not qualify anyone to produce safety-critical software. Although certification of software engineers is not yet required, more events like those associated with the Therac-25 will make such certification inevitable. There is activity in Britain to specify required courses for those working on critical software. Any engineer is not automatically qualified to be a software engineer — an extensive program of study and experience is required. Safety-critical software engineering requires training and experience in addition to that required for noncritical software.

After 32 years, this didn't go the way the report's authors expected, right?

slavik8110mo ago

I am a licensed professional software engineer in Canada. It's been fifteen years since I first registered with my professional association, but I will probably not be renewing my license this year as it's not providing any real benefit to my career.

Two decades ago there was a lot of talk about turning software development into a structured engineering discipline, but that plan seems to have largely been abandoned.

mitthrowaway210mo ago

I've had some discussions with my engineering regulator in Canada. It's clear they have no idea what software engineering even is or who should be regulated or why. I tried to get them to provide some examples of what would and would not count as software engineering, but they couldn't.

firesteelrain10mo ago

To add. Safety-critical software is not something you pick up in a classroom, it is something built over years of disciplined practice. There are standards like DO-178 for avionics and IEC 61508 for industrial systems, but how rigorously they are applied often depends on cost and project constraints. That said, when failures happen, the audit trail will not matter to the people harmed. The history of safety engineering shows that almost every rule exists because someone was hurt first.

mdavid62610mo ago· 3 in thread

Some sanity checks are always a good idea before running such destructive action (IF beam_strength > REASONABLY_HIGH_NUMBER THEN error). Of course the UI bug is hard to catch, but the sanity check would have prevented this completely and the machine would just end up in an error, rather than killing patients.

b_e_n_t_o_n10mo ago

invariants are so useful to enforce even for toy projects. they should never be triggered outside of dev, but if they do sometimes it's better to just let it crash.

bzzzt10mo ago

Making sure the beam is off before crashing would be better though.

b_e_n_t_o_n9mo ago

For sure :P

vemv10mo ago· 3 in thread

My (tragically) favorite part is, from wikipedia:

> A commission attributed the primary cause to generally poor software design and development practices, rather than singling out specific coding errors.

Which to me reads as "this entire codebase was so awful that it was bound to fail in some or other way".

rgoulter10mo ago

Hmm. "poor software design" suggests a high risk that something might go wrong; "poor development practice" suggests that mistakes won't get caught/remedied.

By focusing on particular errors, there's the possibility you'll think "problem solved".

By focusing on process, you hope to catch mistakes as early as possible.

DamonHD10mo ago

When trying to make better systems in moderately-critical roles (investment banking, not medicine though) my approach was both try to understand and fix the immediate fault, but also find out if (and fix if so) any systemic issue that would make other related errors likely.

ycombobreaker10mo ago

Sibling reply notes the "process" is the problem, amd I would second that. I would also like to add, it's perfectly possible to produce a high quality code base with poor practices. This can happen with very small, expert teams. However, certain qualities become high-variance, which becomes a hazard over time.

haddonist10mo ago· 2 in thread

Well There's Your Problem podcast, Episode 121: Therac-25

https://www.youtube.com/watch?v=7EQT1gVsE6I

dpacmittal10mo ago

There's also this video from Kyle Hill which is pretty good (I think it's a different incident though, not sure) - https://www.youtube.com/watch?v=Ap0orGCiou8

voidUpdate10mo ago

My go-tos are usually Fascinating Horror https://www.youtube.com/watch?v=nU5HbUOtyqk and Plainly Difficult https://www.youtube.com/watch?v=-7gVqBY52MY.

I've gone off Kyle Hill after a lot of people pointed out that he was promoting a scam (BetterHelp) on his video about fraud and his response was just to tell people to deal with it

rvz10mo ago· 2 in thread

We're more likely to get a similar incident like this very quickly if we continue with the cult of 'vibe-coding' and throwing away basic software engineering principles out of the window as I said before. [0]

Take this post-mortem here [1] as a great warning and which also highlights exactly what could go horribly wrong if the LLM misreads comments.

What's even more scarier is each time I stumble across a freshly minted project on GitHub with a considerable amount of attention, not only it is 99% vibe-coded (very easy to detect) but it completely lacks any tests written for it.

Makes me question the ability of the user prompting the code in the first place if they even understand how to write robust and battle-tested software.

[0] https://news.ycombinator.com/item?id=44764689

[1] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

voxadam10mo ago

The idea of 'vibe-coding' safety critical software is beyond terrifying. Timing and safety critical software is hard enough to talk about intelligently, even harder to code, harder yet to audit, and damn near impossible to debug, and all that's without neophyte code monkeys introducing massive black boxes full of poorly understood voodoo to the process.

mrguyorama10mo ago

God that "post mortem" is such a portent of things to come. I've seen this exact problem path happen locally nearly any time I use claude. It very obviously just picks what it should put where based on weighted random chance, and that random chance is going to not go in your favor at some point, in a way that no amount of training or job experience can help with, because no, a human would not have made this mistake.

This is the kind of mistake that fails people out of CS101; It's obvious that the student is just manipulating symbols they don't really "get" rather than modifying code. Throwing the chinese room thought experiment at your code base is bad engineering.

autonomousErwin10mo ago· 2 in thread

This reminds me of the Belgium 2003 election that was impossibly skewered by a supernova light years away sending charged particles which manage to get through our atmosphere (allegedly) and flipping a bit. Not the only case it's happened.

jve10mo ago

On the bright side, wow, those computers are really sturdy: takes a whole supernova to just flip a bit :)

kijin10mo ago

Well the thing is, millions of stars go supernova in the observable universe every single day. Throw in the daily gamma ray burst as well, and you've got bit flips all over the place.

Forgret10mo ago· 2 in thread

What surprised me most was that only one developer was working on such an unpredictable technology, whereas I think I need at least 5 developers to be able to discuss options.

throwaway026110mo ago

One of the benefits of regulations in these areas, is that they require proper tests and documentation. This often requires more than one person to handle the load. We don't want to go back to the 80s YOLO mode just because we need to "move faster".

BTW: Relevant XKCD: https://xkcd.com/2347/

mnw21cam10mo ago

Though this XKCD might be even more relevant:

https://xkcd.com/2030/

MerrimanInd10mo ago· 1 in thread

Every mechanical engineer educated in the USA knows the name of two famous collapses: the Tacoma Narrows Bridge and the Hyatt Regency balcony in Kansas City, MO. With an engineering ethics class being part of nearly every undergrad curriculum, these are two of the classic examples for us. I'm curious; do software engineers learn stories like the Therac-25 in their degrees?

scottLobster10mo ago

I was a Computer Engineer, so not quite the same, but we got taught about Therac-25 in our Engineering Ethics class when I took it over a decade ago.

Unfortunately Computer Science is still in its too-cool-for-school phase, see OpenAI being sued over recently encouraging a suicidal teenager to kill themself. You'd think it would be common sense for that to be a hard stop outside of the LLM processing the moment a conversation turns to subjects like that, but nope.

SirMaster10mo ago· 1 in thread

The question I have is why was the hardware capable of delivering a fatal dose like this. Is that actually ever even a usable output for some legitimate reason?

If not, why not hardware limit the power input to the machine, so even if the software completely failed, it would not be physically capable of delivering a fatal dose like this?

PokestarFan10mo ago

I believe that for X-ray mode, the radiation was indirect, so it needed a lot more power. Furthermore, older revisions had hardware locks, and the intent of the Therac-25 was to make it cheaper.

amelius10mo ago· 1 in thread

> The Therac-25 was the first entirely software-controlled radiotherapy device.

This says it all.

theglocksaint10mo ago

Don't know how to break this to you but they all are nowadays.

tedggh10mo ago· 1 in thread

TL;DR

The Therac-25 was a radiation therapy machine built by Atomic Energy Canada Limited in the 1980s. It was the first to rely entirely on software for safety controls, with no hardware interlocks. Between 1985 and 1987, at least six patients received massive overdoses of radiation, some fatally, due to software flaws.

One major case in March 1986 at the East Texas Cancer Center involved a technician who mistyped the treatment type, corrected it quickly, and started the beam. Because of a race condition, the correction didn’t fully register. Instead of the prescribed 180 rads, the patient was hit with up to 25,000 rads. The machine reported an underdose, so staff didn’t realize the harm until later.

Other hospitals reported similar incidents, but AECL denied overdoses were possible. Their safety analysis assumed software could not fail. When the FDA investigated, AECL couldn’t produce proper test plans and issued crude fixes like telling hospitals to disable the “up arrow” key.

The root problem was not a single bug but the absence of a rigorous process for safety-critical software. AECL relied on old code written by one developer and never built proper testing practices. The scandal eventually pushed regulators to tighten standards. The Therac-25 remains a case study of how poor software processes and organizational blind spots can kill—a warning echoed decades later by failures like the Boeing 737 MAX.

thijson10mo ago

I remember my computer science professor talking about this, how critical safety can be in software. Another example he gave was the refueling machine at a nuclear power plant, it had fell off the tracks and broke the pipe that goes through the reactor due to a software bug. Also he mentioned the software in his pacemaker.

Engineers in other fields need to sign off on designs, and can be held liable if something goes wrong. Software hasn't caught up to that yet.

auggierose10mo ago· 1 in thread

Wondering if that "one developer" is here on HN.

Forgret10mo ago

Hahaha, it would be interesting, maybe he just commented on the post here?

throwaway026110mo ago

One of the comments said this:

> That standard [IEC 62304] is surrounded by other technical reports and guidances recognized by the FDA, on software risk management, safety cases, software validation. And I can tell you that the FDA is very picky, when they review your software design and testing documentation. For the first version and for every design change.

> That’s good news for all of us. An adverse event like the Therac 25 is very unlikely today.

This is a case where regulation is a good thing. Unfortunately I see a trend lately where almost any regulation is seen as something stopping innovation and business growth. There are room for improvements and some areas are over regulated, but we don't want a "DOGE" chainsaw to regulations without knowing what the consequences are.

rossant10mo ago

The first commenter on this site introduces himself as "a physician who did a computer science degree before medical school." He is now president of the Ray Helfer Society [1], "an honorary society of physicians seeking to provide medical leadership regarding the prevention, diagnosis, treatment and research concerning child abuse and neglect."

While the cause is noble, the medical detection of child abuse faces serious issues with undetected and unacknowledged false positives [2], since ground truth is almost never knowable. The prevailing idea is that certain medical findings are considered proof beyond reasonable doubt of violent abuse, even without witnesses or confessions (denials are extremely common). These beliefs rest on decades of medical literature regarded by many as low quality because of methodological flaws, especially circular reasoning (patients are classified as abuse victims because they show certain medical findings, and then the same findings are found in nearly all those patients—which hardly proves anything [3]).

I raise this point because, while not exactly software bugs, we are now seeing black-box AIs claiming to detect child abuse with supposedly very high accuracy, trained on decades of this flawed data [4, 5]. Flawed data can only produce flawed predictions (garbage in, garbage out). I am deeply concerned that misplaced confidence in medical software will reinforce wrongful determinations of child abuse, including both false positives (unjust allegations potentially leading to termination of parental rights, foster care placements, imprisonment of parents and caretakers) and false negatives (children who remain unprotected from ongoing abuse).

[1] https://hs.memberclicks.net/executive-committee

[2] https://news.ycombinator.com/item?id=37650402

[3] https://pubmed.ncbi.nlm.nih.gov/30146789/

[4] https://rdcu.be/eCE3l

[5] https://www.sciencedirect.com/science/article/pii/S002234682...

MarkusWandel10mo ago

In a quick skim of the comments so far, I don't see the real smoking gun.

The previous devices had hardware interlocks. So if the software glitched, it was just an annoying glitch - nobody got zapped. But mature software gets trusted, so they removed the hardware interlock as redundant. And then the annoying glitches became fatal. Total miscommunication. The people cost-reducing the hardware interlock only saw mature, trustworthy software. The people living with the glitches only saw them as annoying, but harmless. And then, disaster.

koverstreet10mo ago

A lot of people draw the wrong conclusions from Therac-25 today; becoming overly process driven can become a huge problem for software quality, because the processes have to be the right processes, and once processes are in place people have a natural tendency to defer to them and suspend their own judgement.

That gets actively dangerous; a lot of more recent safety mishaps are more of the variety of "processes were followed, but things went hilariously off the rails and no one noticed and spoke up".

Culture and expertise matter just as much if not more, especially today now that we all (in theory) should understand source control, testing, safer languages, etc.

I think Admiral Rickover's methods apply just as much today, and applying that kind of thinking would fill major gaps in a lot of organizations - he emphasized good communication, a sense of responsibility, and thinking on your feet, and his safety record is unmatched.

I think aviation also approaches process a bit better - by having much of it be more informal, less rigid checklists, it doesn't encourage people to suspend judgement so much.

There's also the Tankship Tromedy, which really emphasizes the engineering legwork of just chasing down, understanding and fixing every last failure mode you can find.

https://www.dieselduck.info/library/08%20policies/2006%20The...

cantrevealname10mo ago

> All of this software, from the individual processes to the OS itself, were the work of a single software developer. They left AECL in 1986, and no one has ever revealed their identity.

I bet some readers are thinking that the developer that caused this tragedy retired with the millions he earned, maybe sailed his yacht to his Caribbean mansion. But the $300K FAANG salaries and multi-million stock options for senior developers represents the last decade or two. In the 1980's, developers were paid poorly and commanded little respect. The heroes in tech companies that sold expensive devices were the salesmen back then. The commission on the sale of a single Therac-25 probably exceeded the developer's salary.

All of the following would indicate that this developer, no matter how senior or capable, was still a low-paid schlub:

- It's Canada, so automatically 20% lower salaries than in the U.S. (AECL is in Canada, so it's a good bet that the developer was Canadian.)

- It's the 1980s, so pre-web, pre-smartphones, pre-Google/Amazon, and developers had little recognition and low demand.

- It's government, known to pay poorly for developers. (AECL is a government-owned corporation.)

- It's mostly embedded software. Even though embedded software can be incredibly complex and life-critical, it's the least visible, so it's among the lower paid areas of software engineering (even today).

For 1986, I would put his salary at $30-50K Canadian, or converted to U.S. dollars at that time would be $26-43K U.S., and inflation adjusted would be $78-129K U.S. today. And no stock options.

w10-110mo ago

This is not the example readers need to understand, because the failures were so rudimentary and systemic that it seems "good process" is the answer.

Having written and validated both FDA and CLIA software, I'd suggest that process is never sufficient.

Plenty of well-meaning people will create and follow incomplete plans and hand-wave away issues when they sign off -- particularly people who gravitate towards rule-based, formulaic work in a hierarchy.

You need people both capable of and willing to seriously question whether proof is really proof, and who will stand up for some random patient in the distant future over their boss and colleagues on a deadline -- and yet they cannot be oppositional or egotistical, and must have deep insight into the subject matter.

It's really, really hard to find those people.

softwaredoug10mo ago

Safety problems are almost never about one evil / dumb person and frequently involve confusing lines of responsibility.

Which makes me very nervous about AI generated code and people who don’t clam human authorship. The bug that creeps in where we scapegoat the AI isn’t gonna cut it in a safety situation.

csours10mo ago

To me, the Therac incident is the poster child for a category I call 'context change error'.

Some of the controls were 'born' in a world of hardware interlocks, and so the engineers used the frame of mind where hardware interlocks exist.

Some time later, the interlocks were replaced with software controls. Since everything had worked before, all the software had to do was what worked before.

But it is VERY difficult to challenge all of your assumptions about what "working" means.

---

This is also a good reminder that work is done by people and teams, not corporations. That is - just because somebody knows the fine details, that does not mean that the corporation knows the fine details.

siva710mo ago

> With AECL's continued failure to explain how to test their device

They can't. There was a single developer, he left, no tests existed, no one understood the mess to confidently make changes. At this point you can either lie your way through the regulators or scrap the product altogether.

I've seen this kind of devs and companies running their software in regulated industries like in the therac incident, just now we are in the year 2025. I left because i understood that it's a criminal charge waiting to happen.

smarks10mo ago

I believe the definitive analysis of the Therac-25 incident was written by Nancy Leveson, first in IEEE Computer,[1] and later as an appendix of her book.[2] The appendix is freely available as a PDF on the web [3][4] and probably other places. Many people here are asking questions about what happened and how it came about. The answers to many of these questions can be found there. I strongly recommend that anyone who is serious about safety and wants to learn more about this incident read Leveson’s analysis.

[1]: N. G. Leveson and C. S. Turner, "An investigation of the Therac-25 accidents," in Computer, vol. 26, no. 7, pp. 18-41, July 1993.

[2]: Nancy Leveson. Safeware: System Safety and Computers. Addison-Wesley, 1995.

[3]: http://sunnyday.mit.edu/papers/therac.pdf

[4]: https://web.mit.edu/6.033/2014/wwwdocs/papers/therac.pdf

rokkamokka10mo ago

I was taught this incident in university many years ago. It's undeniably an important lesson that shouldn't be forgotten

rendaw10mo ago

So reading about this my current company sounds exactly the same. And the one before it, and the one before that.

Critical issues happen with customers, blame gets shifted, a useless fix is proposed in the post mortem and implemented (add another alert to the waterfall of useless alerts we get on call), and we continue to do ineffective testing. Procedural improvements are rejected by the original authors who were then promoted and want to keep feeling like they made something good and are now in a position to enforce that fiction.

So IMO the lesson here isn't that everyone should focus on culture and process, it's that you won't have the right culture and process and (apparently) laws and regulation can overcome the lack of culture and process.

linohh10mo ago

In my university this case was (and probably still is) subject of the first lecture in the first semester. A lot to learn here and one of the prime examples how the DEPOSE model [Perrow 1984] works for software engineering.

salynchnew10mo ago

Great WTYP episode on this: https://www.youtube.com/watch?v=7EQT1gVsE6I

0xDEAFBEAD10mo ago

>any bugs we see would have to be transient bugs caused by radiation or hardware errors.

Can't imagine that radiation might be a factor here...

onewheeltom10mo ago

The manufacturer of the Therac-25, AECL, did not share customer incident reports with other customers when patients were injured. So, the hospitals believed that their incidents were isolated. This may have been legal, but was highly unethical.

snkline10mo ago

I was kinda shocked by the results of his informal survey, because this was a big focus of my ethics course in college. I guess a lot of developers either didn't get a CS degree, or their degree program didn't involve an ethics course.

Duanemclemore10mo ago

There's an excellent episode of Well There's Your Problem about Therac-25.

https://youtu.be/7EQT1gVsE6I

NoSalt10mo ago

Is there a way to get the "gist" of the article, the lesson to be learned without reading the full article? I got to the screaming part and couldn't read any more.

armcat10mo ago

Therac-25 was part of the mandatory "computer ethics" course at my uni, as part of the Computer Science programme, circa early 2000s.

fogzen10mo ago

Almost. It’s a process problem. But the process is a step above the organization. It’s a socio-economic process that incentivizes these problems. It’s capitalism that’s the process problem. That’s the process that introduces the problem into the organization. Without the government regulators making them test nothing would have even been done at all. Because the problem is the organization exists within a framework that pits it against safety. Safety is at odds with what the organization is tasked to do within the process that it exists in.

voxadam10mo ago

(2021)

darepublic10mo ago

sad story. gotta blame canada for this crap. The elements of this story.. hospitals.. a janky attempt at innovation.. passive aggressive denials from otherwise timid demure canadians. Cold grey bureacracy. It all reminds me of the not so great north. The technicians were sipping on their tim hortons slop at the time to make it perfect.

j / k navigate · click thread line to collapse

292 comments

210 comments · 45 top-level

benrutter10mo ago· 52 in thread

[0] https://timharford.com/2025/07/cautionary-tales-captain-kirk...

WalterBright10mo ago

I'm going to disagree.

I have years of experience at Boeing designing aircraft parts. The guiding principle is that no single failure should cause an accident.

vjvjvjvjghv10mo ago

2 more replies

philjohn10mo ago

This.

I've recently entirely re-architected a critical component, and part of this was defense in depth. Stuff is going to go wrong, so having a second or even third line of defense is important.

technofiend10mo ago

1 more reply

layman5110mo ago

Cthulhu_9mo ago

Great point. Earlier in my career, and I think many can see it too, I was very diligent; thorough types, unit tests, defensive programming, assertions at one point, the works.

benrutter9mo ago

It's not that I don't think that's important, but I think with failure you always have an issue around needing N+1 checks (please don't take this as an argument against checks though).

jonahx9mo ago

fulafel9mo ago

graypegg10mo ago

vorgol10mo ago

I was going to recommend that exact podcast episode but you beat me to it. Totally worth listening, especially if you're interested in software bugs.

bell-cot10mo ago

>> the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.

> the earlier (manually operated) version of the machine did have the same fault. But it also had a failsafe fuse that blew so the fault never materialized.

snerbles10mo ago

2 more replies

WalterBright10mo ago

> And PHB's all know that picking up nickles in front of the steamroller is how you get to the C-suite.

BTW, this is why airliners, despite their advanced computerized cockpit, still have an old fashioned turn-and-bank indicator that is independent of all that software.

1 more reply

ipython10mo ago

Don’t worry we are poised to re learn all these lessons once again with our fancy new agentic generative ai systems.

Good news is - it’s not hard to learn the new technology when it’s just a matter of rediscovering the same security issues with a new name!

ChrisMarshallNY10mo ago

> It's the end result of a process

In my experience, it's even more than that. It's a culture.

f1shy10mo ago

franktankbank10mo ago

A culture of high-quality engineering, no doubt. Made up of: high quality engineers!

kulahan10mo ago

edit: and imagine if any time you brought up this issue, bridge builders cockily responded with "well stuff seems to work fine so..."

5 more replies

ChrisMarshallNY10mo ago

Yes, but some of them were the most stubborn bastards I've ever worked with.

2 more replies

herval10mo ago

you don't need "high quality engineers" to have high-quality outputs. And vice-versa - lots of places with very high quality engineers produce terribly low-quality software

3 more replies

anonymars10mo ago

Isn't that exactly the opposite of the point being made?

> software quality doesn't appear because you have good developers

1 more reply

pjmlp10mo ago

The worst part is that many devlopers think that by not working with high integrity systems, such quality levels don't apply to them.

ozim10mo ago

Hey don’t blame developers.

It is business who requests features ASAP to cut costs and and then there are customers who don’t want to pay for „ideal software” but rather have every software for free.

Most devs and QA workers I know want to deliver best quality software and usually are gold plating stuff anyway.

pjmlp10mo ago

Being a real Software Engineer, those that actually have the proper title, eventually with the final examination, means being able to deliver the best product within the set of given constraints.

Also, speaking out when the train is visibly going against a wall.

2 more replies

Cthulhu_9mo ago

Business can request it, but it's your job as a software engineer to build quality software; don't shift the blame.

People want to be able to use a bridge for free too, doesn't mean there's no money in it.

As for gold plating, is that really improving software quality, or is that yak shaving / bike shedding?

chairmansteve10mo ago

I would say the real lesson is that the Therac machine should have had hardware interlocks (mentioned but not emphasised in the article).

sonicggg10mo ago

speed_spread10mo ago

AdamN10mo ago

Fr3dd110mo ago

quietbritishjim10mo ago

Right, but if everyone is low quality then there's no one to do that correction.

3 more replies

rcxdude10mo ago

varjag10mo ago

1 more reply

vjvjvjvjghv10mo ago

“ This way, the work done by a "low quality" software developer (this includes almost all of us at some point in time), is always taken into account by the process”

franktankbank10mo ago

Cthulhu_9mo ago

> (this includes almost all of us at some point in time)

I'd say this includes all of us all the time; a good developer never trusts their own work blindly, and spends more time gathering requirements and verifying their and others' work than writing code.

rowanG07710mo ago

benrutter9mo ago

It's not that great developers aren't necessary for software quality, more that they aren't sufficient.

1 more reply

credit_guy10mo ago

0xDEAFBEAD10mo ago

roeles10mo ago

One instance that crosses my mind often is the airbus a320 incident at Hamburg in 2008. Everything was done right there, but the requirements were wrong.

Despite all the procedures and tests, the software still managed to endanger the lives of the passengers.

Izkata9mo ago

https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Au...

0xDEAFBEAD10mo ago

Interesting, do you happen to have a case study?

2 more replies

I_dream_of_Geni10mo ago

jopsen10mo ago

I'd agree, it's super easy to think such errors won't happen had they just used a fairly safe language and sane architecture. Or unit test, race detectors, etc.

I suspect that few organizations that do all that, have a process/culture of ignoring bugs in the wild -- and those that do have such complicated domains that explaining the error is hard.

Software best practices today would probably also involve sending metrics, logs, error reports, etc.

That said, it's still extremely easy get embrace a culture were unexplainable errors are ignored. Especially in a cloud environment.

hinkley10mo ago

I bring up Knight Capital every time people start acting like feature toggles will solve every problem we have with feature rollout.

KC lost over $400 million in less than an hour due to an old feature toggle and a problem with their deployment process.

jldugger10mo ago

Perhaps this is why the cover of my software correctness book in undergrad used a series of stills from the arianne-5 disaster[1] for the cover.

[1]: https://en.wikipedia.org/wiki/Ariane_5#Notable_launches

wat1000010mo ago

The lesson is not to write a mess like that. It might seem obvious, but it has to be learned.

kccqzy10mo ago

Software engineering has advanced in the past few decades that the kind of code considered a "mess" has expanded.

1 more reply

msy10mo ago

benrutter9mo ago

> It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.

isopede10mo ago· 18 in thread

Personally, I've found even the latest batch of agents fairly poor at embedded systems, and I shudder at the thought of giving them the keys to the kingdom to say... a radiation machine.

SCdF10mo ago

The Horizon (UK Royal Mail accounting software) incident killed multiple postmasters through suicide, and bankrupted and destroyed the lives of dozens or hundreds more.

hahn-kev10mo ago

V__10mo ago

> Engineers are legally obligated to report unsafe conduct, activities or behaviours of others that could pose a risk to the public or the environment. [1]

> Fujitsu was aware that Horizon contained software bugs as early as 1999 [2]

[1] https://engineerscanada.ca/news-and-events/news/the-duty-to-...

[2] https://en.wikipedia.org/wiki/British_Post_Office_scandal

2 more replies

siva710mo ago

sim7c0010mo ago

as you point out this was a messup on a lot of levels. its an interesting effect tho not to be dismissed. how your software works and how its perceived and trusted can impact people psychologically.

codeulike10mo ago

SCdF10mo ago

The code being absolute dog shit was true regardless of that law's existence. There are plenty of things the developers could have done.

That law is irrelevant to this situation, except in that the lawyers for Fujitsu / Royal Mail used it to imply their code was infallable.

fuckaj10mo ago

Given whole truth testimony?

1 more reply

maweki10mo ago

But there is still a difference here. Provenance and proper traceability would have allowed the subpostmasters to show their innocence and prove the system failable.

In the Therac-25 case, the killing was quite immediate and it would have happened even if the correct radiation dose was recorded.

scott_w10mo ago

SCdF10mo ago

I don't understand the distinction here.

> Provenance and proper traceability would have allowed

But there wasn't those things, so they couldn't, so they were driven to suicide.

Bad software killed people. It being slow or fast doesn't seem to matter.

1 more reply

grues-dinner10mo ago

I also find AI copilot things do not work well with embedded software. Again, people YOLOing embedded isn't new, but it might be about to get worse.

the-grump10mo ago

The 737 MAX MCAS debacle was one such failure, albeit involving a wider system failure and not purely software.

Agreed on the future but I think we were headed there regardless.

jonplackett10mo ago

Yeah reading this reminded me a lot of MCAS. Though MCAS was intentionally implemented and intentionally kept secret.

throwawayoldie10mo ago

They killed "only" about 350 people combined, but the two fatal crashes of the Boeing 737 MAX in 2018 and 2019 were due to poor quality software:

https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Au...

Maxion10mo ago

> Personally, I've found even the latest batch of agents fairly poor at embedded systems

I mean even simple crud web apps where the data models are more complex, and where the same data has multiple structures, the LLMs get confused after the second data transformation (at the most).

E.g. You take in data with field created_at, store it as created_on, and send it out to another system as last_modified.

sim7c0010mo ago

talk to anyone in the industries about 'automation' on medical or critical infra devices and they will tell you NO. No touching our devices with your rubbish.

i am pretty confident they wont let claude touch if it they dont even let deterministic automations run...

its a different world that makin apps thats for sure.

throwaway026110mo ago

> but i dont think people designing medical devices will be going yolo mode on their dev cycle anytime soon

I don't have the same faith in corporate leadership as you, at least not when they see potentially huge savings by firing some of the expensive developers and using AI to write more of the code.

napolux10mo ago· 18 in thread

The most deadly bug in history. If you know any other deadly bug, please share! I love these stories!

kgwgk10mo ago

Several people killed themselves over this: https://www.wikipedia.org/wiki/British_Post_Office_scandal

https://www.theguardian.com/uk-news/2024/jan/09/how-the-post...

What sort of bugs resulted?

BoxOfRain10mo ago

It's an archetypal example of 'one law for the connected, another law for the proles'.

benrutter10mo ago

http://www0.cs.ucl.ac.uk/staff/a.finkelstein/papers/lascase....

napolux9mo ago

thanks for sharing!

A1kmm10mo ago

NitpickLawyer10mo ago

The MCAS related bugs @ Boeing led to 300+ deaths, so it's probably a contender.

solids10mo ago

Was that a bug or a failure to inform pilots about a new system?

thyristan10mo ago

It's almost never just software. It's almost never just one cause.

1 more reply

AdamN10mo ago

Both - and really MCAS was fine but the issue was the metering systems (Pitot tubes) and the handling of conflicting data. That part of the puzzle was definitely a bug in the logic/software.

3 more replies

NitpickLawyer10mo ago

Worryingly, e2e / full integration testing was also the main cause of other Boeing blunders, like the Starliner capsule.

fuckaj10mo ago

Not a bug. A non airworthy plane they tried to patch up with software.

1 more reply

bobmcnamara10mo ago

In Dhahran, Saudi Arabia, on February 25, 1991, a Patriot missile failed to intercept an Iraqi Scud causing the death of 28 American soldiers.

The fix was being deployed earlier that year but this unit hadn't been updated yet.

https://www.cs.unc.edu/~smp/COMP205/LECTURES/ERROR/lec23/nod...

danadam10mo ago

Some Google Pixel phones couldn't dial emergency number (still can't?). I don't know if there were any deadly consequences of that.

https://www.androidauthority.com/psa-google-pixel-911-emerge...

echelon10mo ago

The 737 Max MCAS is arguably a bug. That killed 346 people.

Not a "bug" per se, but texting while driving kills ~400 people per year in the US. It's a bug at some level of granularity.

To be tongue in cheek a bit, buggy JIRA latency has probably wasted 10,000 human years. Those are many whole human lives if you count them up.

b_e_n_t_o_n10mo ago

> To be tongue in cheek a bit, buggy JIRA latency has probably wasted 10,000 human years. Those are many whole human lives if you count them up.

These kind of calculations always make me wonder...say someone wasted one minute of everybody's life, is the cost ~250 lives? One minute? Somewhere in between?

throwaway026110mo ago

There was a news story from Norway last year where a car allegedly accelerated by itself, causing the car to fall off the second floor of a parking garage and kill the driver.

mnw21cam10mo ago

bobmcnamara10mo ago

Time and time again the introduction of electronic throttle control has spiked the number of reported unintended acceleration incidents.

There's a chart here that shows it clearly for Toyota's rollout:

https://www.embedded.com/unintended-acceleration-and-other-e...

1 more reply

elric10mo ago· 14 in thread

One of the commenters on the article wrote this:

I was briefly part of an experiment with electronic patient records in an ICU in the early 2000s. My job was to basically babysit the server processing the records in the ICU.

This was *not* just a case of doctors having unfounded "feelings" that computers were dangerous. Computers were very much more dangerous than pen and paper.

I haven't been involved in that industry since then, and I imagine things have gotten better since, but still worth keeping in mind.

jacquesm10mo ago

misja11110mo ago

skinwill10mo ago

Around here we have Epic. If you want a good scare, look up their corporate Willy Wonka-esq jail/campus and their policy of zero remote work.

Liquix10mo ago

I thought "Willy Wonka-esque" was tongue in cheek, but they have a yellow brick road leading up to the front door... absolutely bizarre

Those who want to escape the office altogether, can hop on one of the company’s 600 cow-print bikes to take meetings from a treehouse, slide down a rabbit hole or grab lunch in a train car.

https://www.cnbc.com/2024/09/01/inside-epic-systems-mythical...

2 more replies

simulator5g10mo ago

bongodongobob10mo ago

3 more replies

greazy10mo ago

It's still an issue. I've heard stories of EMR system going down forcing staff to use pen and paper. It boggles my mind that such systems don't have redundancy.

These are commercial products being deployed.

elric10mo ago

The other theory is there are soo many bureaucratic hoops to jump through in order to make anything in the medical space, that no one does it willingly.

siva710mo ago

It's not only the doctors, i have the gut feeling from my previous stint that people who like to work in the medical space are more often than not "difficult".

NortySpock10mo ago

The redundancy is pen and paper. The EMR just helps teams coordinate faster, pull up records faster, etc.

* Daylight Saving Time in the USA. And yes, everyone in the USA changes non-networked clocks twice a year because of some "brilliant idea" someone shoved through Congress in 1974.

EDIT: I wonder if Cerner finally fixed it?

chuckadams10mo ago

The idea of Daylight Savings Time goes back to Ben Franklin, and a lot of Europe does it too. I think it’s an obsolete notion nowadays, but it’s not exclusively a Dumb Yank Idea.

greazy9mo ago

Pen and paper is back up in case of extreme events, and not because the software crashed or a bad patch was applied.

Millions on software with no software redundancy built in is insane in the medical field.

superjan10mo ago

It”s worthwhile to mention that in the US and EU EMRs are generally not considered Medical Devices and are therefore not subject to a lot of regulations.

https://www.medicaleconomics.com/view/what-if-emrs-were-clas...

elric9mo ago

That is disturbing, but it does explain a lot.

1 more reply

michaelt10mo ago· 12 in thread

I'd be interested in knowing how many of y'all are being taught about this sort of thing in college ethics/safety/reliability classes.

Is this sort of thing still taught to engineers and developers in college these days?

FuriouslyAdrift10mo ago

Analog systems do not behave like computers.

ramses010mo ago

The "IBM Black Team Debugs a Tape Drive" story comes to mind: https://www.penzba.co.uk/GreybeardStories/TheBlackTeam.html

wocram10mo ago

This was part of our Systems Engineering class, something like this: https://web.mit.edu/6.033/2014/wwwdocs/assignments/therac25....

smarks10mo ago

Thanks for posting this. The Leveson article is IMO the definitive study on this incident. I’m going to repost at top level for emphasis.

mrguyorama10mo ago

You can't teach people to care.

BoxOfRain10mo ago

I was taught about it in university as a computer science undergrad, thought about it often since I ended up working in medtech.

aDyslecticCrow10mo ago

Im too curious, I made a poll. I for sure wasnt in computer science uni. I only heard about it vaguely online.

https://strawpoll.com/NMnQNX9aAg6

mlnhd10mo ago

This and Tacoma Narrows are literally the only topics covered in engineering ethics, which itself is literally only a one hour presentation.

InvisibleUp10mo ago

Don’t forget the Hyatt Regency walkway, too.

lgeek10mo ago

It was taught in a first year software ethics class on my Computer Science programme. Back in 2010. I'm wondering if they still do

firesteelrain10mo ago

I was taught Computer Ethics back in the early 2000s as part of my CS degree.

3D3049742010mo ago

I studied design and I wish we'd had a design ethics class, which would have covered instances like this.

mellosouls10mo ago· 9 in thread

TIL TheDailyWTF is still active. I'd thought it had settled to greatest hits only some years ago.

greatgib10mo ago

This story is kind of old. But also I'm suspicious that this was an AI generated content due to this weird paragraph (one becoming "they"):

   It's worth noting that there was one developer who wrote all of this code. They left AECL in 1986, and thankfully for them, no one has ever revealed their identity. And while it may be tempting to lay the blame at their feet—they made every technical choice, they coded every bug—it would be wildly unfair to do that.

pie_flavor10mo ago

semv3r10mo ago

Singular "they" has been used since at least the 14th century—was generative AI commonly available then? https://en.wikipedia.org/wiki/Singular_they

remyporter10mo ago

lopespm9mo ago

I am really amazed by the frequency and the quality of your output.

Would you have an article on your routines, how you structure your day / work? Essentially, what enables your consistency, and quality articles?

edot10mo ago

Isn’t that the pronoun to use when you’re unsure of gender? This article didn’t feel AI-y to me.

bigstrat200310mo ago

2 more replies

tbossanova10mo ago

That is 100% standard english, dude. I feel like I might have read that exact sentence 20 years ago...

HankStallone10mo ago

OskarS10mo ago· 7 in thread

jwr10mo ago

No, this is not a "hilariously silly thing" for developers. In fact, I'd say that most developers place way too much trust in software.

crazygringo10mo ago

> but also are happy to get in those cars. They are OK with trusting their life to new, complex, poorly understood and poorly tested software systems

pfdietz10mo ago

brazzy10mo ago

> there is arguably a root assumption in both cases that people made, which is that "the software can't be wrong"

I think in this case, the thought process was based on the experience with older, electro-mechanical machines where the most common failure modern was parts wearing out.

Since software can, indeed, not "wear out", someone made the assumption that it was therefore inherently more reliable.

balamatom10mo ago

ndsipa_pomu10mo ago

As far as I know, the Therac-25 incidents were reasonably honest mistakes.

OskarS10mo ago

zackmorris10mo ago· 6 in thread

Our power went off a couple off weeks ago due to wind probably knocking a branch into a power line. Now our Frigidaire microwave runs with the door open.

Supposedly there are mechanical switches that prevent that, but evidently "modern" microwaves can control the gun through the logic board.

Are smaller companies better at engineering safety? Evidently not.

bongodongobob10mo ago

simoncion10mo ago

> If I stick a pencil in the door latch I can get it to run with the door open as well.

bongodongobob9mo ago

1 more reply

dezgeg10mo ago

It's possible that it's just operating the lights and turntable without actually cooking - if you search HN you find that failure mode.

zackmorris9mo ago

_kb10mo ago

Stand next to it with a chocolate bar in your pocket. I believe that’s the canonical way to check if it’s active.

ChrisMarshallNY10mo ago· 4 in thread

I worked for hardware manufacturers for most of my career, as a software guy.

In my experience, hardware people really dis software. It's hard to get them to take it seriously.

When something like this happens, they tend to double down on shading software.

kccqzy10mo ago

ChrisMarshallNY10mo ago

That was the case for the company that I worked at.

The official QA organization was very powerful, and had no compunctions about stopping an entire product line, for one bug.

When that happened, the department responsible for the bug would find themselves against the wall.

As a result, all the software departments had pretty big teams of testers, who would validate the software, before it was released to the purview of the QA organization.

It could be pretty restricting, but we always felt confident that what we shipped, worked.

scottLobster10mo ago

ChrisMarshallNY10mo ago

I think that was the F-35 incident (in Alaska).

The Therac-25 incident was a radiation overdose in Texas.

1 more reply

haunter10mo ago· 3 in thread

My "favorite" part:

Kinda reminds me how everything is touchscreen nowadays from car interfaces to industry critical software

hiccuphippo10mo ago

And we have a concept, optimistic updates, for making the ui look responsive while the updates happen in the background and reconcile later. I can only hope they know when not to use it.

kevincox9mo ago

ramses010mo ago

   Try quickly typing 1+ 2 + 3 into the iOS 11 Calculator (reddit.com)
   886 points by danso on Oct 24, 2017 | hide | past | favorite | 480 comments

https://news.ycombinator.com/item?id=15538666

...this _exact_ same failure mode in a "less" critical domain (eg: literally your most frequently used "pocket calculator"), unless you're using the calculator for Important Things(tm).

Tenemo10mo ago· 3 in thread

The full 1993 report linked in the article has an intetesting statement regarding software developer certfication in the "Lessons learned" chapter:

After 32 years, this didn't go the way the report's authors expected, right?

slavik8110mo ago

Two decades ago there was a lot of talk about turning software development into a structured engineering discipline, but that plan seems to have largely been abandoned.

mitthrowaway210mo ago

firesteelrain10mo ago

mdavid62610mo ago· 3 in thread

b_e_n_t_o_n10mo ago

invariants are so useful to enforce even for toy projects. they should never be triggered outside of dev, but if they do sometimes it's better to just let it crash.

bzzzt10mo ago

Making sure the beam is off before crashing would be better though.

b_e_n_t_o_n9mo ago

For sure :P

vemv10mo ago· 3 in thread

My (tragically) favorite part is, from wikipedia:

> A commission attributed the primary cause to generally poor software design and development practices, rather than singling out specific coding errors.

Which to me reads as "this entire codebase was so awful that it was bound to fail in some or other way".

rgoulter10mo ago

Hmm. "poor software design" suggests a high risk that something might go wrong; "poor development practice" suggests that mistakes won't get caught/remedied.

By focusing on particular errors, there's the possibility you'll think "problem solved".

By focusing on process, you hope to catch mistakes as early as possible.

DamonHD10mo ago

ycombobreaker10mo ago

haddonist10mo ago· 2 in thread

Well There's Your Problem podcast, Episode 121: Therac-25

https://www.youtube.com/watch?v=7EQT1gVsE6I

dpacmittal10mo ago

There's also this video from Kyle Hill which is pretty good (I think it's a different incident though, not sure) - https://www.youtube.com/watch?v=Ap0orGCiou8

voidUpdate10mo ago

My go-tos are usually Fascinating Horror https://www.youtube.com/watch?v=nU5HbUOtyqk and Plainly Difficult https://www.youtube.com/watch?v=-7gVqBY52MY.

I've gone off Kyle Hill after a lot of people pointed out that he was promoting a scam (BetterHelp) on his video about fraud and his response was just to tell people to deal with it

rvz10mo ago· 2 in thread

Take this post-mortem here [1] as a great warning and which also highlights exactly what could go horribly wrong if the LLM misreads comments.

Makes me question the ability of the user prompting the code in the first place if they even understand how to write robust and battle-tested software.

[0] https://news.ycombinator.com/item?id=44764689

[1] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

voxadam10mo ago

mrguyorama10mo ago

autonomousErwin10mo ago· 2 in thread

jve10mo ago

On the bright side, wow, those computers are really sturdy: takes a whole supernova to just flip a bit :)

kijin10mo ago

Well the thing is, millions of stars go supernova in the observable universe every single day. Throw in the daily gamma ray burst as well, and you've got bit flips all over the place.

Forgret10mo ago· 2 in thread

What surprised me most was that only one developer was working on such an unpredictable technology, whereas I think I need at least 5 developers to be able to discuss options.

throwaway026110mo ago

BTW: Relevant XKCD: https://xkcd.com/2347/

mnw21cam10mo ago

Though this XKCD might be even more relevant:

https://xkcd.com/2030/

MerrimanInd10mo ago· 1 in thread

scottLobster10mo ago

I was a Computer Engineer, so not quite the same, but we got taught about Therac-25 in our Engineering Ethics class when I took it over a decade ago.

SirMaster10mo ago· 1 in thread

The question I have is why was the hardware capable of delivering a fatal dose like this. Is that actually ever even a usable output for some legitimate reason?

If not, why not hardware limit the power input to the machine, so even if the software completely failed, it would not be physically capable of delivering a fatal dose like this?

PokestarFan10mo ago

I believe that for X-ray mode, the radiation was indirect, so it needed a lot more power. Furthermore, older revisions had hardware locks, and the intent of the Therac-25 was to make it cheaper.

amelius10mo ago· 1 in thread

> The Therac-25 was the first entirely software-controlled radiotherapy device.

This says it all.

theglocksaint10mo ago

Don't know how to break this to you but they all are nowadays.

tedggh10mo ago· 1 in thread

TL;DR

thijson10mo ago

Engineers in other fields need to sign off on designs, and can be held liable if something goes wrong. Software hasn't caught up to that yet.

auggierose10mo ago· 1 in thread

Wondering if that "one developer" is here on HN.

Forgret10mo ago

Hahaha, it would be interesting, maybe he just commented on the post here?

throwaway026110mo ago

One of the comments said this:

> That’s good news for all of us. An adverse event like the Therac 25 is very unlikely today.

rossant10mo ago

[1] https://hs.memberclicks.net/executive-committee

[2] https://news.ycombinator.com/item?id=37650402

[3] https://pubmed.ncbi.nlm.nih.gov/30146789/

[4] https://rdcu.be/eCE3l

[5] https://www.sciencedirect.com/science/article/pii/S002234682...

MarkusWandel10mo ago

In a quick skim of the comments so far, I don't see the real smoking gun.

koverstreet10mo ago

That gets actively dangerous; a lot of more recent safety mishaps are more of the variety of "processes were followed, but things went hilariously off the rails and no one noticed and spoke up".

Culture and expertise matter just as much if not more, especially today now that we all (in theory) should understand source control, testing, safer languages, etc.

I think aviation also approaches process a bit better - by having much of it be more informal, less rigid checklists, it doesn't encourage people to suspend judgement so much.

There's also the Tankship Tromedy, which really emphasizes the engineering legwork of just chasing down, understanding and fixing every last failure mode you can find.

https://www.dieselduck.info/library/08%20policies/2006%20The...

cantrevealname10mo ago

> All of this software, from the individual processes to the OS itself, were the work of a single software developer. They left AECL in 1986, and no one has ever revealed their identity.

All of the following would indicate that this developer, no matter how senior or capable, was still a low-paid schlub:

- It's Canada, so automatically 20% lower salaries than in the U.S. (AECL is in Canada, so it's a good bet that the developer was Canadian.)

- It's the 1980s, so pre-web, pre-smartphones, pre-Google/Amazon, and developers had little recognition and low demand.

- It's government, known to pay poorly for developers. (AECL is a government-owned corporation.)

For 1986, I would put his salary at $30-50K Canadian, or converted to U.S. dollars at that time would be $26-43K U.S., and inflation adjusted would be $78-129K U.S. today. And no stock options.

w10-110mo ago

This is not the example readers need to understand, because the failures were so rudimentary and systemic that it seems "good process" is the answer.

Having written and validated both FDA and CLIA software, I'd suggest that process is never sufficient.

It's really, really hard to find those people.

softwaredoug10mo ago

Safety problems are almost never about one evil / dumb person and frequently involve confusing lines of responsibility.

Which makes me very nervous about AI generated code and people who don’t clam human authorship. The bug that creeps in where we scapegoat the AI isn’t gonna cut it in a safety situation.

csours10mo ago

To me, the Therac incident is the poster child for a category I call 'context change error'.

Some of the controls were 'born' in a world of hardware interlocks, and so the engineers used the frame of mind where hardware interlocks exist.

Some time later, the interlocks were replaced with software controls. Since everything had worked before, all the software had to do was what worked before.

But it is VERY difficult to challenge all of your assumptions about what "working" means.

---

siva710mo ago

> With AECL's continued failure to explain how to test their device

smarks10mo ago

[1]: N. G. Leveson and C. S. Turner, "An investigation of the Therac-25 accidents," in Computer, vol. 26, no. 7, pp. 18-41, July 1993.

[2]: Nancy Leveson. Safeware: System Safety and Computers. Addison-Wesley, 1995.

[3]: http://sunnyday.mit.edu/papers/therac.pdf

[4]: https://web.mit.edu/6.033/2014/wwwdocs/papers/therac.pdf

rokkamokka10mo ago

I was taught this incident in university many years ago. It's undeniably an important lesson that shouldn't be forgotten

rendaw10mo ago

So reading about this my current company sounds exactly the same. And the one before it, and the one before that.

linohh10mo ago

salynchnew10mo ago

Great WTYP episode on this: https://www.youtube.com/watch?v=7EQT1gVsE6I

0xDEAFBEAD10mo ago

>any bugs we see would have to be transient bugs caused by radiation or hardware errors.

Can't imagine that radiation might be a factor here...

onewheeltom10mo ago

snkline10mo ago

Duanemclemore10mo ago

There's an excellent episode of Well There's Your Problem about Therac-25.

https://youtu.be/7EQT1gVsE6I

NoSalt10mo ago

Is there a way to get the "gist" of the article, the lesson to be learned without reading the full article? I got to the screaming part and couldn't read any more.

armcat10mo ago

Therac-25 was part of the mandatory "computer ethics" course at my uni, as part of the Computer Science programme, circa early 2000s.

fogzen10mo ago

voxadam10mo ago

(2021)

darepublic10mo ago

j / k navigate · click thread line to collapse