The problem is that they fail differently than humans do, in a way that humans wind up not trusting the results.
It turns out that there are parts of the breast that are easy to spot tumors in, and parts that are hard. A human scans quickly over the easy areas, and focuses on the hard. The result is that humans make careless errors on the easy areas, and catch hard tumors. Computers make no careless errors, but can't catch the hard ones. Thus when a human sees what the computer caught that the human did not, the mistake is easily dismissed. But when the human sees the ones that the computer missed, it becomes, "It doesn't know how to do the real work."
Ideally the two would be used together for better results than either alone. But humans wind up resenting the computer...
Not only that, but human fallibility is accepted where machine fallibility is not. There's something about being a "person" which makes it acceptable for you to just take the blame for something. A senior radiologist makes a glaring error and "it happens, people make mistakes". A computer makes the same error and it's a problem which must be fixed before the computer can be trusted.
Ultimately I believe this is a cognitive bias that we're just going to have to learn to let go of.
Unfortunately I don't think this is merely cognitive bias. It's actually built into our legal system at a pretty fundamental level: machines are held to a higher standard than humans when it comes to failures with grave consequences.
And keep in mind, this system only achieves the same accuracy as doctors. What is the wnd benefit, other than shifting where money flows?
Do you really think this benefit is substantial enough that we will se major overhauls of tort law in all US states and in every country in the rest of the world?
There are a number of issues, but it's true that raw algorithmic performance is a small part of the whole picture.
The real work is building systems around people who do the interpretation and labeling to make their jobs easier.
That is a great argument for giving such a model as an aid to a human doctor. Together they will be better then either one alone.
The big picture is, this patchy performance is the writing is on the wall. It's over for radiologists, for the most part.
The nature of this problem is a great fit for ML and it will in short order (10 years) be superior in the vast majority of scenarios to expert level humans.
People say, but psychology, fear, unknowns, will require human supervision indefinitely. Of course that's true.
The problem is, radiologists will effectively be relegated to proofreaders. The number of minutes required of them per patient will plummet and so will their job market (unless changes allow many more untreated people to get imaged).
What about the researchers? Even they will take a hit as the imaging analysis part of radiology research moves more along the spectrum toward yet another computer science problem.
And yet humans have not yet been replaced.
Tell me what you expect to be different about the next 10 years that wasn't in the last 20? I'm open to being convinced. But you have to not just say that computers are going to be better - you have to explain why there wasn't already a switch.
Call me back when your machine can produce a comprehensive analysis of an anatomic configuration in relation to every element in the patient's file.
The real problem with those ML systems is that people don't understand what a radiologist is. Let's perhaps solve that problem first?
The humans that supervise that will become the new radiologists. The "best" of those humans will have cross-field disciplines in ML model development and traditional radiology. My capitalist side sees a huge opportunity here in consulting and helping the existing radiology departments(who are interested) bridge the gap from current practice to a hybrid approach.
Adapt or die, etc.
You mean dismissed as in not believing the result or not being impressive? (unclear if that's what you mean later by resenting and if it is tied to this statement)
And if you use the software as a backup, people don't respond well to their dumb mistakes being pointed out. And the result is that people put effort in to not making dumb mistakes...and therefore either slow down or have less time for what humans can do better than computers.
In theory it should work a lot better than it actually does.
basis for this assertion?
I also am not sure saying computers can't catch the hard ones is true in light of this. It seems like a deep neural network would be useful to catch the hard ones.
I agree with using the two together, but I don't see why the two can't be different AI subsystems. That seems to be what they're going for at IBM. Some of the power comes from a scientific model, while more power comes from well-trained deep learning networks.
Before Alpha Zero, advances in self-driving cars, and face detection advances, I'd have agreed with you.
She had a lot of scepticism for what she did. One of biggest showstoppers she said was the unpredictability of errors.
An algo can catch 99% tumors, including tiny ones, bur can randomly pass over very obvious ones which a human radiologist will spot with his eyes closed.
They had a demo day with radiologists, and them throwing tricky edge case xrays at the computer. Edge cases were all ok, but one radiologist pulled his own xray from his bag, with a 100% obvious, terminal stage tumor, and to company's embarrassment, the algo failed to detect it no mater how they twisted and scaled the xray. The guy then just walked out.
This is one of the (many) reasons that practical classification systems, as against research systems, tend to become Frankenstein's monsterish over time. It's naive to think that a single approach and pipeline will cover your domain well.
It'll be the best of both.
And in the scans where the algo is wrong, have the scan added to the machine learning database of the algo.
1. 99% > 95%, or whatever the radiologist's accuracy is.
2. Combine both systems for obvious gains.Up to date we allow humans to be at a 70% error level without problems, but we ask machines to be 100% effective.
The very same happens with autopilot, the big numbers say they drive better than humans but...
Having a confusion matrix for what the model predict correct or not is not the same as having a CI for the model's prediction.
TP = ... flags as true and it is
FN = ... flags as false but it is true
TN = ... flags as false and it is false
To turn these into rates, you normalize them.
e.g. TPR = TP/P = TP/(TP + FN) = 1 - FNR
etc.
These are characteristics of a classification system
You will also hear sensitivity (TPR) vs. specificity (TNR) often, particularly in medical contexts. In other contexts you'll hear Type I (FP) vs. Type II (FN) error.
In most cases you a set of trade offs in your algorithm, and will need to pick a balance between sensitivity and specificity.
c.f. ROC: https://en.wikipedia.org/wiki/Receiver_operating_characteris...
AUC is 0.78
Sensitivity-Specificity Graph is here: http://images.rsna.org/index.html?doi=10.1148/radiol.2019182...
This is their comparison point for actual radiologists. Citation number 6. It doesn't look comparable, though. Radiologists are around 90% specificity and sensitivity, which varies a good amount from the model's 77.3% and 87%, respectively.
Watson for IBM is like the i-prefix for Apple.
There is no "true" intuition in medicine, just years of study and practice leading to quick recognition of common problems like any other field.
Here's some previous results https://med.stanford.edu/news/all-news/2018/11/ai-outperform...