_In the UK, each mammogram is interpreted by two readers, and in cases of disagreement, an arbitration process may invoke a third opinion. These interpretations occur serially, such that each reader has access to the opinions of previous readers._
I wonder if it's sensible to condition the opinion of the second radiologist this way. My first impression is that this should be done blindly or the temptation of just confirming the previous reading because of fatigue/lazynes or to avoid confrontation could severely affect the results.
Similarly, a future in which the radiologist gets too much support could just impair their skills and turn them into a sort of a mechanical worker who simply accepts what the AI overlord has decided.
What they’re doing is interesting, but it’s still very academic. I have little doubt that eventually some sort of AI system will benefit clinical practice, but based on the sheer number of studies that fail to make it over the line, I’m not sure I have high hopes for this one. Why they’ve done so far is the equivalent of “It works in vitro...”
All the same, it feels sort of beside the point to me. It just doesn’t feel right to take a medical diagnostic tool - whose intended purpose is for communication among doctors - and treat it as a test score. That’s just... not what it was designed for.
This is probably due to invasive cancers being more common (~80%) than in situ cancers. I am not sure why this natural explanation was not suggested.
> In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening.
So, there you have it: AI not "either/or" humans, but both, in conjunction, as a composition of the best of both worlds.
At the very least, that's how civilization will massively and intimately introduce true assistant AI.
It's also somewhat counter-intuitive to think that the most specialized tasks are the low hanging fruits; i.e. that the "difficult" to us, culminating years of training and experience for humans (e.g. how to read a medical scan) may be, per its natural advantages (like speed and parallelism), "easy" to the machine.
That space (where machine expertise is cheaper than human) roughly maps to the immense value attributed to the rise of industrial-age narrow AI; therein lies not a way to replace humans — we never did that in history, merely destroyed jobs to create ever more — but rather to augment ourselves once more to whole new levels of performance.
Anything more than this is AGI-level, science-fiction so far — and there's not even a shred of evidence that it's theoretically a sure thing, possible in the first place. Which is not to say that AI safety research isn't extremely important even for the narrow kind (manipulation comes to mind), but we shouldn't go as far as to bet future economic growth on its existence. Like fusion or interstellar travel, we just don't know. Yet, and for the foreseeable future, because scale.
However, given that this research used a deep learning (DL) based AI system in the medical industry, there are still questions around this AI system explaining itself and its internal decision process for the sake of transparency, which will almost be ignored in other news reporting sites and will focus only on the accuracy. DL-based AI systems will still be a concern towards both patients and clinicians and I would expect this to be a focus point in the future, despite the welcoming results which is still very interesting anyways.
Other than the transparency issues behind the AI system, I'd say this is a great start into the new decade for AI.
Another thing I find interesting is that Google was able to train a neural network on retinas and can reliably distinguish sex based on retinal image alone...something opthamologists basically can't do. So not only are these systems approaching human capability in tasks we can do, they can do things we can't. As medical data becomes more freely flowing (presumably) over the next couple of decades, i think we'll find that 'AI' can become even more reliable.
Well known every major pacs vendor is looking for assisted findings (some have been available for ages, ex: icad bi-rads finding for telling rad to check) or even just case prioritization for radiologists. (Ex: aidoc has an algorithm for brain bleed for case prioritization, not a diagnosis).
They all are employing machine learning really (zebra medical claims 30 million scans processed).
Medical "ai" "algorithms" companies are vastly growing in the past few years
Eh... well it's more complicated than that. These systems CAN diagnose, but their regulatory approval is only for use as an aid, not as a diagnosis tool.
How far Nature has fallen these days? How long before Nature is merely PR agency for the big tech?
It is interesting to note the differences. For example, DeepMind notes "In our reader study, all of the radiologists were eligible to interpret screening mammograms in the USA, but did not uniformly receive fellowship training in breast imaging." whereas DeepHealth notes "All readers were fellowship trained in breast imaging", so +1 to DeepHealth.
On the other hand, DeepMind says "Where data were available, readers were equipped with contextual information typically available in the clinical setting, including the patient’s age, breast cancer history, and previous screening mammograms." while DeepHealth says "Radiologists did not have any information about the patients (such as previous medical history, radiology reports, and other patient records)", so +1 to DeepMind. And so on. These differences make direct comparison between studies very difficult.
Depending on the context the model ends up being used in something that appears good may not be. For example the fellowship training thing - these non-fellowship trained radiologists are doing this task now, so it is absolutely reasonable to assess against them to test real-world performance.
It would be interesting to see if the fellowship trained radiologists did actually perform better in all circumstances (in some fields the better trained radiologists end up not using their skills on as broad a range of patients, so their performance is actually worse one some subsets of data).