Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good example.

---

The system distinguished between errors that halted the machine, requiring a restart, and errors which merely paused the machine (which allowed operators to continue with the same settings using a keypress). However, some errors which endangered the patient merely paused the machine, and the frequent occurrence of minor errors caused operators to become accustomed to habitually unpausing the machine.

One failure occurred when a particular sequence of keystrokes was entered on the VT-100 terminal which controlled the PDP-11 computer: if the operator were to press "X" to (erroneously) select 25 MeV photon mode, then use "cursor up" to edit the input to "E" to (correctly) select 25 MeV Electron mode, then "Enter", all within eight seconds of the first keypress, well within the capability of an experienced user of the machine. These edits weren't noticed as it would take 8 seconds for startup, so it would go with the default setup.[3]

---

... which allowed the electron beam to be set for X-ray mode without the X-ray target being in place. A second fault allowed the electron beam to activate during field-light mode, during which no beam scanner was active or target was in place.

Previous models had hardware interlocks to prevent such faults, but the Therac-25 had removed them, depending instead on software checks for safety.

The high-current electron beam struck the patients with approximately 100 times the intended dose of radiation, and over a narrower area, delivering a potentially lethal dose of beta radiation. The feeling was described by patient Ray Cox as "an intense electric shock", causing him to scream and run out of the treatment room.[4] Several days later, radiation burns appeared, and the patients showed the symptoms of radiation poisoning; in three cases, the injured patients later died as a result of the overdose.[5]

---

In response to incidents like those associated with Therac-25, the IEC 62304 standard was created, which introduces development life cycle standards for medical device software and specific guidance on using software of unknown pedigree.[7]

https://en.wikipedia.org/wiki/Therac-25



This sounds like poor consideration for edge cases - not really a problem with the UI or people clicking through it too fast. Anything that could be interpreted as remotely fatal should've shut the machine down.


The control software should not be physically able to command the hardware to enter an invalid state. You can do that by only exposing the 3 valid modes to the software or only enabling power to the emitter if every piece of hardware is in the correct place when the software request arrives.

You also have a hardware lock on the power - this can be as simple as a hardware timer (a RC circuit siffices) which limits how long the emitter can be on within in a given window to be safe.

Never trust the software. If you must trust some software, create a minimal set you CAN trust which isolates the rest of the software from the hardware.

You are correct, the discussion about how to exercise this bug (fast UI, blah blah) is interesting to hear but totally irrelevant to the lesson (don't trust software).


They basically did no testing at all on that machine, and reused the previous software which relied on hardware safety interlocks which had been removed from the newer model. It's literally a textbook case of how not to do mission-critical software.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: