If you only take one thing away from this article, it should be this one! The Therac-25 incident is a horrifying and important part of software history, it's really easy to think type-systems, unit-testing and defensive-coding can solve all software problems. They definitely can help a lot, but the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.
There was a great Cautionary Tales podcast about the device recently[0], one thing mentioned was that, even aside from the catasrophic accidents, Therac-25 machines were routinely seen by users to show unexplained errors, but these issues never made it to the desk of someone who might fix it.
[0] https://timharford.com/2025/07/cautionary-tales-captain-kirk...
I have years of experience at Boeing designing aircraft parts. The guiding principle is that no single failure should cause an accident.
The way to accomplish this is not "write quality software", nor is it "test the software thoroughly". The idea is "assume the software does the worst possible thing. Then make sure that there's an independent system that will prevent that worst case."
For the Therac-25, that means a detector of the amount of radiation being generated, which will cut it off if it exceeds a safe value. I'd also add that the radiation generator be physically incapable of generating excessive radiation.
I agree with the previous poster that the feedback from the field is lacking a lot. A lot of doctors don’t report problems back because they are used to bad interfaces. And then the feedback gets filtered through several layers of sales reps and product management. So a lot of info gets lost and fixes that could be simple won’t get done.
In general when you work in medical you are so overwhelmed by documentation and regulation that there isn’t much time left to do proper engineering. The FDA mostly looks at documentation done right and less at product done right.
One of the biggest things I see in junior engineers that I mentor (working in backend high throughput, low latency, distributed systems) is not working out all of the various failure modes your system will likely encounter.
Network partitions, primary database outage, caching layer outage, increased latency ... all of these things can throw a spanner in the works, but until you've experienced them (or had a strong mentor guide you) it's all abstract and difficult to see when the happy path is right there.
I've recently entirely re-architected a critical component, and part of this was defense in depth. Stuff is going to go wrong, so having a second or even third line of defense is important.
But this opens up a can of worms, as suddenly you have to deal with every edge case, test for every possible input, etc. This was before fuzz testing, too. Each line of defensive coding, every carefully crafted comment, etc all added to the maintenance burden; I'd even go as far as claim it increased uncertainty, because what if I forgot something?
15 years later and it feels like I'm doing far less advanced stuff (although in hindsight what I did then wasn't all that, but I made it advanced). One issue came up recently; a generic button component would render really tall if no label was given, which happened when a CMS editor did not fill in a label in an attempt to hide it. The knee-jerk response would be to add a check that disallows empty labels, or to not render the button if no label is given, or to use a default button label.
But now I think I'll look at the rendering bug and just... leave the rest. A button with an empty label isn't catastrophic. Writing rules for every possible edge case (empty label, whitespaces, UTF-8 characters escaping the bounds, too long text, too short text, non-text, the list goes on) just adds maintenance and complexity. And it's just a button.
The Therac-25 was meant to have a detector of radiation levels to cut things off if a safe value was exceeded, but it didn't work. It could obviously have been improved, but you always have the possibility that "what if our check doesn't work?".
In the case of the Therac-25, if the first initial failures had been reported and investigated, my understanding is (I should make clear I'm not an expert here) it would have made the issues apparent, and it could have been recalled before any of the fatal incidents happened.
In a swiss cheese model of risk, you always want as many layers as possible, so your point about a detector fits in there, but the final layer should always be if an incident does happen, and something gets past all our checks, how can we make it likely that it gets investigated fully by the right person.
I will say that me pretending to know how to best design medical equipment as a web developer is pretty full of myself haha. Highly doubt whatever I'm spouting is a new idea. The idea of working on this sort of high-reliability + high-recoverability systems seems really interesting though!
Another interesting fact mentioned in the podcast is that the earlier (manually operated) version of the machine did have the same fault. But it also had a failsafe fuse that blew so the fault never materialized. Excellent demonstration of the Swiss Cheese Model: https://en.wikipedia.org/wiki/Swiss_cheese_model
> the earlier (manually operated) version of the machine did have the same fault. But it also had a failsafe fuse that blew so the fault never materialized.
#1 virtue of electromechanical failsafes is that their conception, design, implementation, and failure modes tend to be orthogonal to those of the software. One of the biggest shortcomings of Swiss Cheese safety thinking is that you too-often end up using "neighbor slices from the same wheel of cheese".
#2 virtue of electromechanical failsafes is that running into them (the fuse blew, or whatever) is usually more difficult for humans to ignore. Or at least it's easier to create processes and do training that actually gets the errors reported up the chain. (Compared to software - where the worker bees all know you gotta "ignore, click 'OK', retry, reboot" all the time, if you actually want to get anything done):
But, sadly, electromechanical failsafes are far more expensive then "we'll just add some code to check that" optimism. And PHB's all know that picking up nickles in front of the steamroller is how you get to the C-suite.
The mechanical interlock essentially functioned as a limit outside of the control system. So you should build an ai system the same way- enforcing restrictions on the security agency from outside the control of the ai itself. Of course that doesn’t happen and devs naively trust that the ai can make its own security decisions.
Another lesson from that era we are re learning- in-band signaling. Our 2025 version of the “blue box” is in full swing. Prompt injection is just a side effect of the fact that there is no out of band instruction mechanism for llms.
Good news is - it’s not hard to learn the new technology when it’s just a matter of rediscovering the same security issues with a new name!
> It's the end result of a process
In my experience, it's even more than that. It's a culture.
Wrong, any software failure can have huge consequences in someone's life, or company, by preventing some critical flow to take place, corrupting data related to someone's life, professional or medical record, preventing a payment on some specific goods that had to be acquired on that moment or never,....
It is business who requests features ASAP to cut costs and and then there are customers who don’t want to pay for „ideal software” but rather have every software for free.
Most devs and QA workers I know want to deliver best quality software and usually are gold plating stuff anyway.
It's not that great developers aren't necessary for software quality, more that they aren't sufficient.
Despite all the procedures and tests, the software still managed to endanger the lives of the passengers.
I suspect that few organizations that do all that, have a process/culture of ignoring bugs in the wild -- and those that do have such complicated domains that explaining the error is hard.
Software best practices today would probably also involve sending metrics, logs, error reports, etc.
That said, it's still extremely easy get embrace a culture were unexplainable errors are ignored. Especially in a cloud environment.
KC lost over $400 million in less than an hour due to an old feature toggle and a problem with their deployment process.
[1]: https://en.wikipedia.org/wiki/Ariane_5#Notable_launches
Good product cultures are ones where natural communication between the field and engineering would mean issues get reported back up and make their way to the right people. No process will compensate for people not giving a shit.
I think the bit I quoted, especially if you read in the context of the article, is talking about culture. I.e. it's talking about a process that informs software development, management and sales. Things like formal proofing and type systems are the exact kind of processes that aren't what it's talking about.
I kind of agree with you though about the process/culture distinction - ultimately, if you don't have a culture where people actively care about improving reliability, any process is just gonna become a tick-box exercise to appease management.
> Throughout the 80s and 90s there was just a feeling in medicine that computers were dangerous <snip> This is why, when I was a resident in 2002-2006 we still were writing all of our orders and notes on paper.
I was briefly part of an experiment with electronic patient records in an ICU in the early 2000s. My job was to basically babysit the server processing the records in the ICU.
The entire staff hated the system. They hated having to switch to computers (this was many years pre-ipad and similarly sleek tablets) to check and update records. They were very much used to writing medications (what, when, which dose, etc) onto bedside charts, which were very easy to consult and very easy to update. Any kind of dataloss in those records could have fatal consequences. Any delay in getting to the information could be bad.
This was *not* just a case of doctors having unfounded "feelings" that computers were dangerous. Computers were very much more dangerous than pen and paper.
I haven't been involved in that industry since then, and I imagine things have gotten better since, but still worth keeping in mind.
These are commercial products being deployed.
The other theory is there are soo many bureaucratic hoops to jump through in order to make anything in the medical space, that no one does it willingly.
When I worked at Cerner years ago (now owned by Oracle), there were rumors that the Cerner EMR still could barely handle DST* spring forward, but could not handle DST fall back (where the 01:00 hour is repeated) -- you had do preemptively switch to pen-and-paper for the hours around the switch. I assume this was because someone back in the initial database design used local time instead of UTC for some critical patient-care timestamp fields in the database, and then had a bear of a time getting reliable times out of the database during the witching hour.
* Daylight Saving Time in the USA. And yes, everyone in the USA changes non-networked clocks twice a year because of some "brilliant idea" someone shoved through Congress in 1974.
EDIT: I wonder if Cerner finally fixed it?
https://www.medicaleconomics.com/view/what-if-emrs-were-clas...
I am a developer and whatever software system I touch breaks horribly. When my family wants to use an ATM, they tell me to stand at a distance, so that my aura doesn't break things. This is why I will not get into a self-driving car in the foreseeable future — I think we place far too much confidence in these complex software systems. And yet I see that the overwhelming majority of HN readers are not only happy to be beta-testers for this software as participants in road traffic, but also are happy to get in those cars. They are OK with trusting their life to new, complex, poorly understood and poorly tested software systems, in spite of every other software system breaking and falling apart around them.
[anticipating immediate common responses: 1) yes, I know that self-driving car companies claim that their cars are statistically safer than human drivers, this is beyond the point here. One, they are "safer" largely because they drive so badly that other road participants pay extra attention and accommodate their weirdness, and two, they are still new, complex and poorly understood systems. 2) "you already trust your life to software systems" — again, beyond the point, not quite true as many software systems are built to have human supervision and override capability (think airplanes), and others are built to strict engineering requirements (think brakes in cars) while self-driving cars are not built that way.]
Because the alternative isn't bug-free driving -- it's a human being. Who maybe didn't sleep last night, who might have a heart attack while their foot is on the accelerator, who might pull over and try to sexually assault you.
You don't need to "place confidence in these complex software systems" -- you just need to look at their safety stats vs e.g. regular Uber. It's not a matter of trust; it's literally just a matter of statistics, and choosing the less risky option.
I think in this case, the thought process was based on the experience with older, electro-mechanical machines where the most common failure modern was parts wearing out.
Since software can, indeed, not "wear out", someone made the assumption that it was therefore inherently more reliable.
Bureaucracy being (per Graeber 2006) something like the ritual where by means of a set of pre-fashioned artifacts for each other's sake we all operate at 2% of our normal mental capacities and that's how modern data-driven, conflict-averse societies organize work and distribute resources without anyone being able to have any complaints listened to.
>Bureaucracies public and private appear—for whatever historical reasons—to be organized in such a way as to guarantee that a significant proportion of actors will not be able to perform their tasks as expected. It also exemplifies what I have come to think of the defining feature of a utopian form of practice, in that, on discovering this, those maintaining the system conclude that the problem is not with the system itself but with the inadequacy of the human beings involved.
Most places where a computer system is involved in the administration of a public service or something of the caliber, has that been a grassroots effort, hey computers are cool and awesome let's see what they change? No, it's something that's been imposed in the definitive top-down manner of XX century bureaucracies. Remember the cohort of people who used to become stupid the moment a "thinking machine" was powered within line of sight (before the last uncomputed generation retired and got their excuse to act dumb for the rest of it)? Consider them in view the literally incomprehensible number of layers that any "serious" piece of software consists of; layers which we're stuck producing more of, when any software professional knows the best kind of software is less of it.
But at least it saves time and the forest, right? Ironically, getting things done in a bureaucratic context with less overhead than filling out paper forms or speaking to human beings, makes them even easier to fuck up. And then there's the useful fiction of "the software did it" that e.g. "AI agents" thing is trying to productize. How about they just give people a liability slider in the spinup form, eh, but nah.
Wanna see a miracle? A miracle is when people hype each other into pretending something impossible happened. To the extent user-operated software is involved in most big-time human activities, the daily miracle is how it seems to work well enough, for people to be able to pretend it works any good at all. Many more than 3 such cases. But of course remembering the catastrophal mistakes of the past can be turned into a quaint fun-time activity. Building things that empower people to make less mistakes, meanwhile, is a little different from building artifacts for non-stop "2% time".
As far as I know, the Therac-25 incidents were reasonably honest mistakes.
Personally, I've found even the latest batch of agents fairly poor at embedded systems, and I shudder at the thought of giving them the keys to the kingdom to say... a radiation machine.
The core takeaway developers should have from Therac-25 is not that this happens just on "really important" software, but that all software is important, and all software can kill, and you need to always care.
In the Therac-25 case, the killing was quite immediate and it would have happened even if the correct radiation dose was recorded.
Which is not to say that software hasn't killed people before (Horizon, Boeing, probably loads of industrial accidents and indirect process control failures leading to dangerous products, etc, etc). Hell, there's a suspicion that austerity is at least partly predicated on a buggy Excel spreadsheet, and with about 200k excess deaths in a decade (a decade not including Covid) in one country, even a small fraction of those being laid at the door of software is a lot of Theracs.
AI will probably often skate away from responsibility in the same way that Horizon does: by being far enough removed and with enough murky causality that they can say "well, sure, it was a bug, but them killing themselves isn't our fault"
I also find AI copilot things do not work well with embedded software. Again, people YOLOing embedded isn't new, but it might be about to get worse.
Agreed on the future but I think we were headed there regardless.
https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Au...
I mean even simple crud web apps where the data models are more complex, and where the same data has multiple structures, the LLMs get confused after the second data transformation (at the most).
E.g. You take in data with field created_at, store it as created_on, and send it out to another system as last_modified.
i am pretty confident they wont let claude touch if it they dont even let deterministic automations run...
that being said, maybe there are places. but this is always the sentiment i got. no automating, no scanning, no patching. device is delivered certified and any modifications will invalidate that. any changes need to be validated and certified.
its a different world that makin apps thats for sure.
not to say mistakes arent made and change doesnt happen, but i dont think people designing medical devices will be going yolo mode on their dev cycle anytime soon... give the folks in safety critical system engineering some credit..
I don't have the same faith in corporate leadership as you, at least not when they see potentially huge savings by firing some of the expensive developers and using AI to write more of the code.
>One failure occurred when a particular sequence of keystrokes was entered on the VT100 terminal that controlled the PDP-11 computer: If the operator were to press "X" to (erroneously) select 25 MeV photon mode, then use "cursor up" to edit the input to "E" to (correctly) select 25 MeV Electron mode, then "Enter", all within eight seconds of the first keypress and well within the capability of an experienced user of the machine, the edit would not be processed and an overdose could be administered. These edits were not noticed as it would take 8 seconds for startup, so it would go with the default setup
Kinda reminds me how everything is touchscreen nowadays from car interfaces to industry critical software
Try quickly typing 1+ 2 + 3 into the iOS 11 Calculator (reddit.com)
886 points by danso on Oct 24, 2017 | hide | past | favorite | 480 comments
https://news.ycombinator.com/item?id=15538666...this _exact_ same failure mode in a "less" critical domain (eg: literally your most frequently used "pocket calculator"), unless you're using the calculator for Important Things(tm).
I was taught about this in engineering school, as part of a general engineering course also covering things like bathtub reliability curves and how to calculate the number of redundant cooling pumps a nuclear power plant needs. But it's a long time since I was in college.
Is this sort of thing still taught to engineers and developers in college these days?
Analog systems do not behave like computers.
You can't teach people to care.
> Taking a couple of programming courses or programming a home computer does not qualify anyone to produce safety-critical software. Although certification of software engineers is not yet required, more events like those associated with the Therac-25 will make such certification inevitable. There is activity in Britain to specify required courses for those working on critical software. Any engineer is not automatically qualified to be a software engineer — an extensive program of study and experience is required. Safety-critical software engineering requires training and experience in addition to that required for noncritical software.
After 32 years, this didn't go the way the report's authors expected, right?
Two decades ago there was a lot of talk about turning software development into a structured engineering discipline, but that plan seems to have largely been abandoned.
> That standard [IEC 62304] is surrounded by other technical reports and guidances recognized by the FDA, on software risk management, safety cases, software validation. And I can tell you that the FDA is very picky, when they review your software design and testing documentation. For the first version and for every design change.
> That’s good news for all of us. An adverse event like the Therac 25 is very unlikely today.
This is a case where regulation is a good thing. Unfortunately I see a trend lately where almost any regulation is seen as something stopping innovation and business growth. There are room for improvements and some areas are over regulated, but we don't want a "DOGE" chainsaw to regulations without knowing what the consequences are.
While the cause is noble, the medical detection of child abuse faces serious issues with undetected and unacknowledged false positives [2], since ground truth is almost never knowable. The prevailing idea is that certain medical findings are considered proof beyond reasonable doubt of violent abuse, even without witnesses or confessions (denials are extremely common). These beliefs rest on decades of medical literature regarded by many as low quality because of methodological flaws, especially circular reasoning (patients are classified as abuse victims because they show certain medical findings, and then the same findings are found in nearly all those patients—which hardly proves anything [3]).
I raise this point because, while not exactly software bugs, we are now seeing black-box AIs claiming to detect child abuse with supposedly very high accuracy, trained on decades of this flawed data [4, 5]. Flawed data can only produce flawed predictions (garbage in, garbage out). I am deeply concerned that misplaced confidence in medical software will reinforce wrongful determinations of child abuse, including both false positives (unjust allegations potentially leading to termination of parental rights, foster care placements, imprisonment of parents and caretakers) and false negatives (children who remain unprotected from ongoing abuse).
[1] https://hs.memberclicks.net/executive-committee
[2] https://news.ycombinator.com/item?id=37650402
[3] https://pubmed.ncbi.nlm.nih.gov/30146789/
[5] https://www.sciencedirect.com/science/article/pii/S002234682...
Supposedly there are mechanical switches that prevent that, but evidently "modern" microwaves can control the gun through the logic board.
The engineering failures that led to this, from conceptual to design to internal control, boggle my mind. I'm not even sure where to send a complaint or if it would result in any kind of compensation. Because billion dollar corporations know that they'll never have to face any kind of corporate death penalty because they're protected by limited liability. So we'll just buy another $150 microwave instead.
Are smaller companies better at engineering safety? Evidently not.
"The safety interlocks don't work when the operator intentionally goes out of his way to defeat them." isn't a concern. There's only so much you can do to prevent someone who's dedicated to disabling them.
"The safety interlocks fail dangerous because of an unexpected power cut." is a huge concern. What else did the manufacturer skimp on, or -worse- simply fail to understand was important to do for the safety of the operator of the device?
The previous devices had hardware interlocks. So if the software glitched, it was just an annoying glitch - nobody got zapped. But mature software gets trusted, so they removed the hardware interlock as redundant. And then the annoying glitches became fatal. Total miscommunication. The people cost-reducing the hardware interlock only saw mature, trustworthy software. The people living with the glitches only saw them as annoying, but harmless. And then, disaster.
I've gone off Kyle Hill after a lot of people pointed out that he was promoting a scam (BetterHelp) on his video about fraud and his response was just to tell people to deal with it
> A commission attributed the primary cause to generally poor software design and development practices, rather than singling out specific coding errors.
Which to me reads as "this entire codebase was so awful that it was bound to fail in some or other way".
By focusing on particular errors, there's the possibility you'll think "problem solved".
By focusing on process, you hope to catch mistakes as early as possible.
That gets actively dangerous; a lot of more recent safety mishaps are more of the variety of "processes were followed, but things went hilariously off the rails and no one noticed and spoke up".
Culture and expertise matter just as much if not more, especially today now that we all (in theory) should understand source control, testing, safer languages, etc.
I think Admiral Rickover's methods apply just as much today, and applying that kind of thinking would fill major gaps in a lot of organizations - he emphasized good communication, a sense of responsibility, and thinking on your feet, and his safety record is unmatched.
I think aviation also approaches process a bit better - by having much of it be more informal, less rigid checklists, it doesn't encourage people to suspend judgement so much.
There's also the Tankship Tromedy, which really emphasizes the engineering legwork of just chasing down, understanding and fixing every last failure mode you can find.
https://www.dieselduck.info/library/08%20policies/2006%20The...
I bet some readers are thinking that the developer that caused this tragedy retired with the millions he earned, maybe sailed his yacht to his Caribbean mansion. But the $300K FAANG salaries and multi-million stock options for senior developers represents the last decade or two. In the 1980's, developers were paid poorly and commanded little respect. The heroes in tech companies that sold expensive devices were the salesmen back then. The commission on the sale of a single Therac-25 probably exceeded the developer's salary.
All of the following would indicate that this developer, no matter how senior or capable, was still a low-paid schlub:
- It's Canada, so automatically 20% lower salaries than in the U.S. (AECL is in Canada, so it's a good bet that the developer was Canadian.)
- It's the 1980s, so pre-web, pre-smartphones, pre-Google/Amazon, and developers had little recognition and low demand.
- It's government, known to pay poorly for developers. (AECL is a government-owned corporation.)
- It's mostly embedded software. Even though embedded software can be incredibly complex and life-critical, it's the least visible, so it's among the lower paid areas of software engineering (even today).
For 1986, I would put his salary at $30-50K Canadian, or converted to U.S. dollars at that time would be $26-43K U.S., and inflation adjusted would be $78-129K U.S. today. And no stock options.
Having written and validated both FDA and CLIA software, I'd suggest that process is never sufficient.
Plenty of well-meaning people will create and follow incomplete plans and hand-wave away issues when they sign off -- particularly people who gravitate towards rule-based, formulaic work in a hierarchy.
You need people both capable of and willing to seriously question whether proof is really proof, and who will stand up for some random patient in the distant future over their boss and colleagues on a deadline -- and yet they cannot be oppositional or egotistical, and must have deep insight into the subject matter.
It's really, really hard to find those people.
Which makes me very nervous about AI generated code and people who don’t clam human authorship. The bug that creeps in where we scapegoat the AI isn’t gonna cut it in a safety situation.
Some of the controls were 'born' in a world of hardware interlocks, and so the engineers used the frame of mind where hardware interlocks exist.
Some time later, the interlocks were replaced with software controls. Since everything had worked before, all the software had to do was what worked before.
But it is VERY difficult to challenge all of your assumptions about what "working" means.
---
This is also a good reminder that work is done by people and teams, not corporations. That is - just because somebody knows the fine details, that does not mean that the corporation knows the fine details.
They can't. There was a single developer, he left, no tests existed, no one understood the mess to confidently make changes. At this point you can either lie your way through the regulators or scrap the product altogether.
I've seen this kind of devs and companies running their software in regulated industries like in the therac incident, just now we are in the year 2025. I left because i understood that it's a criminal charge waiting to happen.
In my experience, hardware people really dis software. It's hard to get them to take it seriously.
When something like this happens, they tend to double down on shading software.
I have found it very, very difficult to get hardware people to understand that software has a different ruleset and workflow, from hardware. They interpret this as "cowboy software," and think we're trying to weasel out of structure.
The official QA organization was very powerful, and had no compunctions about stopping an entire product line, for one bug.
When that happened, the department responsible for the bug would find themselves against the wall.
As a result, all the software departments had pretty big teams of testers, who would validate the software, before it was released to the purview of the QA organization.
It could be pretty restricting, but we always felt confident that what we shipped, worked.
The Therac-25 incident was a radiation overdose in Texas.
[1]: N. G. Leveson and C. S. Turner, "An investigation of the Therac-25 accidents," in Computer, vol. 26, no. 7, pp. 18-41, July 1993.
[2]: Nancy Leveson. Safeware: System Safety and Computers. Addison-Wesley, 1995.
[3]: http://sunnyday.mit.edu/papers/therac.pdf
[4]: https://web.mit.edu/6.033/2014/wwwdocs/papers/therac.pdf
Unfortunately Computer Science is still in its too-cool-for-school phase, see OpenAI being sued over recently encouraging a suicidal teenager to kill themself. You'd think it would be common sense for that to be a hard stop outside of the LLM processing the moment a conversation turns to subjects like that, but nope.
Take this post-mortem here [1] as a great warning and which also highlights exactly what could go horribly wrong if the LLM misreads comments.
What's even more scarier is each time I stumble across a freshly minted project on GitHub with a considerable amount of attention, not only it is 99% vibe-coded (very easy to detect) but it completely lacks any tests written for it.
Makes me question the ability of the user prompting the code in the first place if they even understand how to write robust and battle-tested software.
[0] https://news.ycombinator.com/item?id=44764689
[1] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
This is the kind of mistake that fails people out of CS101; It's obvious that the student is just manipulating symbols they don't really "get" rather than modifying code. Throwing the chinese room thought experiment at your code base is bad engineering.
Critical issues happen with customers, blame gets shifted, a useless fix is proposed in the post mortem and implemented (add another alert to the waterfall of useless alerts we get on call), and we continue to do ineffective testing. Procedural improvements are rejected by the original authors who were then promoted and want to keep feeling like they made something good and are now in a position to enforce that fiction.
So IMO the lesson here isn't that everyone should focus on culture and process, it's that you won't have the right culture and process and (apparently) laws and regulation can overcome the lack of culture and process.
It's worth noting that there was one developer who wrote all of this code. They left AECL in 1986, and thankfully for them, no one has ever revealed their identity. And while it may be tempting to lay the blame at their feet—they made every technical choice, they coded every bug—it would be wildly unfair to do that.It's interesting, because all the older works in its training data will default to the masculine singular, and that has to be a massive number of books too. But maybe the modern writing, including lots of online sources, simply overwhelms that. Or it's one of the guardrails written into the AIs to avoid offending people.
BTW: Relevant XKCD: https://xkcd.com/2347/
Can't imagine that radiation might be a factor here...
If not, why not hardware limit the power input to the machine, so even if the software completely failed, it would not be physically capable of delivering a fatal dose like this?
This says it all.
The Therac-25 was a radiation therapy machine built by Atomic Energy Canada Limited in the 1980s. It was the first to rely entirely on software for safety controls, with no hardware interlocks. Between 1985 and 1987, at least six patients received massive overdoses of radiation, some fatally, due to software flaws.
One major case in March 1986 at the East Texas Cancer Center involved a technician who mistyped the treatment type, corrected it quickly, and started the beam. Because of a race condition, the correction didn’t fully register. Instead of the prescribed 180 rads, the patient was hit with up to 25,000 rads. The machine reported an underdose, so staff didn’t realize the harm until later.
Other hospitals reported similar incidents, but AECL denied overdoses were possible. Their safety analysis assumed software could not fail. When the FDA investigated, AECL couldn’t produce proper test plans and issued crude fixes like telling hospitals to disable the “up arrow” key.
The root problem was not a single bug but the absence of a rigorous process for safety-critical software. AECL relied on old code written by one developer and never built proper testing practices. The scandal eventually pushed regulators to tighten standards. The Therac-25 remains a case study of how poor software processes and organizational blind spots can kill—a warning echoed decades later by failures like the Boeing 737 MAX.
Engineers in other fields need to sign off on designs, and can be held liable if something goes wrong. Software hasn't caught up to that yet.
https://www.theguardian.com/uk-news/2024/jan/09/how-the-post...
One member of the development team, David McDonnell, who had worked on the Epos system side of the project, told the inquiry that “of eight [people] in the development team, two were very good, another two were mediocre but we could work with them, and then there were probably three or four who just weren’t up to it and weren’t capable of producing professional code”.
What sort of bugs resulted?
As early as 2001, McDonnell’s team had found “hundreds” of bugs. A full list has never been produced, but successive vindications of post office operators have revealed the sort of problems that arose. One, named the “Dalmellington Bug”, after the village in Scotland where a post office operator first fell prey to it, would see the screen freeze as the user was attempting to confirm receipt of cash. Each time the user pressed “enter” on the frozen screen, it would silently update the record. In Dalmellington, that bug created a £24,000 discrepancy, which the Post Office tried to hold the post office operator responsible for.
Another bug, called the Callendar Square bug – again named after the first branch found to have been affected by it – created duplicate transactions due to an error in the database underpinning the system: despite being clear duplicates, the post office operator was again held responsible for the errors.
It's an archetypal example of 'one law for the connected, another law for the proles'.
http://www0.cs.ucl.ac.uk/staff/a.finkelstein/papers/lascase....
Killing 20 innocents and one Hamas member is not a bug - it is callous, but that's a policy decision and the software working as intended. But when it is a false positive (10% of the time), due to inadequate / outdated data and inadequate models, that could reasonably classified as a bug - so all 21 deaths for each of those bombings would count as deaths caused by a bug. Apparently (at least earlier versions) of Gospel were trained on positive examples that mean someone is a member of Hamas, but not on negative examples; other problems could be due to, for example, insufficient data, and interpolation outside the valid range (e.g. using pre-war data about, e.g. how quickly cell phones are traded, or people movements, when behaviour is different post-war).
I'd therefore estimate that deaths due to classification errors from those systems is likely in the thousands (out of the 60k+ Palestinian deaths in the conflict). Therac-25's bugs caused 6 deaths for comparison.
The patriot missile system used floating point for time, so as uptime extended the clock became more and more granular, eventually to the point where time skipped so far that the range gate was tripped.
The fix was being deployed earlier that year but this unit hadn't been updated yet.
https://www.cs.unc.edu/~smp/COMP205/LECTURES/ERROR/lec23/nod...
https://www.androidauthority.com/psa-google-pixel-911-emerge...
Not a "bug" per se, but texting while driving kills ~400 people per year in the US. It's a bug at some level of granularity.
To be tongue in cheek a bit, buggy JIRA latency has probably wasted 10,000 human years. Those are many whole human lives if you count them up.
These kind of calculations always make me wonder...say someone wasted one minute of everybody's life, is the cost ~250 lives? One minute? Somewhere in between?