undefined | Better HN

0 pointshyperpape1mo ago0 comments

> we must assume that the best AI models (especially ones focusing solely in the medical field) would largely beat large majority of humans (aka doctors), if we already have this assumption for software engineers, we should have it for this field as well,

This is a pretty wild leap. Code has a lot of hooks for training via hill-climbing during post-training. During post-training, you can literally set up arbitrary scenarios and give the bot more or less real feedback (actual programs, actual tests, actual compiler errors).

It's not impossible we'll get a training regime that does the "same thing" for medicine that we're doing for code, but I don't know that we've envisioned what it looks like.

0 comments

9 comments · 2 top-level

sdwr1mo ago· 5 in thread

Emergency medicine is the coding of medicine. Fast feedback loop, requires broad rather than deep judgement, concrete next steps.

The AI coding improvement should be partially transferrable to other disciplines without recreating the training environment that made it possible in the first place. The model itself has learned what correct solutions "feel like", and the training process and meta-knowledge must have improved a huge amount.

dghlsakjg1mo ago

I would argue that the ED is the least similar to code. You have the most unknowns, unreliable data and history, non deterministic options and time constraints.

An ER staff is frequently making inferences based on a variety of things like weather, what the pt is wearing, what smells are present, and a whole lot of other intangibles. Frequently the patients are just outright lying to the doctor. An AI will not pick up on any of that.

TurdF3rguson1mo ago

> An AI will not pick up on any of that.

It will if it trains on data like that. It's all about the training data.

n8henrie1mo ago

Unfortunately the training data is absolute garbage.

Diagnostic standards in (at least emergency, but I think other specialties) medicine are largely a joke -- ultimately it's often either autopsy or "expert consensus."

We get to bill more for more serious diagnoses. The amount of patients I see with a "stroke" or "heart attack" diagnosis that clearly had no such thing is truly wild.

We can be sued for tens of millions of dollars for missing a serious diagnosis, even if we know an alternative explanation is more likely.

If AI is able to beat an average doctor, it will be due to alleviating perverse incentives. But I can't imagine where we could get training data that would let it be any less of a fountain of garbage than many doctors.

Without a large amount of good training data, how could AI possibly be good at doctoring IRL?

1 more reply

mrbungie1mo ago

The user will be adversarial and probably learn new tricks to trick the machine, this is not solvable (only) via training data.

1 more reply

zbentley1mo ago

To give this more credit than it perhaps deserves: training aside, getting the situational data into the context is a more significant problem here.

Pt's chart is complex/wrong? Gotta ingest that into context.

Chart contains images/scanned and not OCR'd text? Gotta do an image recognition pass.

Diagnosis needs to know what the pt's wearing (i.e. radiation badge)? Gotta do an image recognition pass.

Diagnosis needs to know what the weather's like? Internet API access of some kind. Hope the WAN/API are all working! If they're not, do you fail open or closed?

Patient might be lying? Gotta do video/audio analysis to assess that likelihood--oh, and train a model that fully solves one of the holy grails of computer vision/audio analysis reliably and with a super low false-positive rate before you do. And if it guesses wrong, enjoy the incredibly easy-to-prosecute lawsuit.

Patient might be lying, but the biggest clue is e.g. smell of alcohol on their breath? Now you need some sort of olfactory sensor kit and training for it--a lot more than just "low quality body cam and a mic".

Patient's ODing on a street drug that became abundant in the last few months? Gotta somehow learn about recent local medical/police history that post-dates the training set, or else you might be pouring gas on a fire if you give them Narcan. And that's assuming you know enough to search for information about that drug, and that they didn't lie to you about what they took. Addicts never do that.

Failures in each of those systems bring down the chance of an effective diagnosis, so they need a fairly obsessive amount of model introspection/thinking/double-checking, and humans on standby as a fallback if the AI's less than confident (assuming that LLMs can be given a sense of a confidence level in the future, versus the current state of the art of "text-predict a guess about what your confidence level might be").

Put that all together, and even with the AI compute speed available years from now and a perfectly trained futuristic model that's preternaturally good at this stuff, I'm not sure that that the reliability and, more importantly, the turnaround time of that diagnostic pass is going to be any good compared to a human ER doc.

DrewADesign1mo ago· 2 in thread

Code is pretty much the perfect use case for LLMs… text-based, very pattern-oriented, extremely limited complexity compared to biological systems, etc.

I suspect even prose is largely considered acceptable in professional uses because we haven’t developed a sensitivity to the artifice, and we probably won’t catch up to the LLMs in that arms race for a bit. However, we always manage to develop a distaste for cheap imitations and relegate them to somewhere between the ‘utilitarian ick’ and ‘trashy guilty pleasure’ bins of our cultures, and I predict this will be the same. The cultural response is already bending in that direction, and AI writing in the wild— the only part that culturally matters— sounds the same to me as it did a year and a half ago. I think they’re prairie dogging, but when(/if) they drop that bomb is entirely a matter of product development. You can’t un-drop a bomb and it will take a long time to regain status as a serious tool once society deems it gauche.

The assumption that LLMs figuring out coding means they can figure out anything is a classic case of Engineer’s Disease. Unfortunately, this hubris seems damn near invisible to folks in the tech industry, these days.

SirHumphrey1mo ago

And with the code, the closer you come to the physical world the worse LLMs fair.

Claude can’t really write Openscad and when I was debugging some map projections code last week it struggled a lot more than usual.

prplxd_nihilist1mo ago

Until anthropic hire or steal code from acquired companies and train with it.

1 more reply

j / k navigate · click thread line to collapse

0 comments

9 comments · 2 top-level

sdwr1mo ago· 5 in thread

Emergency medicine is the coding of medicine. Fast feedback loop, requires broad rather than deep judgement, concrete next steps.

dghlsakjg1mo ago

I would argue that the ED is the least similar to code. You have the most unknowns, unreliable data and history, non deterministic options and time constraints.

TurdF3rguson1mo ago

> An AI will not pick up on any of that.

It will if it trains on data like that. It's all about the training data.

n8henrie1mo ago

Unfortunately the training data is absolute garbage.

Diagnostic standards in (at least emergency, but I think other specialties) medicine are largely a joke -- ultimately it's often either autopsy or "expert consensus."

We get to bill more for more serious diagnoses. The amount of patients I see with a "stroke" or "heart attack" diagnosis that clearly had no such thing is truly wild.

We can be sued for tens of millions of dollars for missing a serious diagnosis, even if we know an alternative explanation is more likely.

Without a large amount of good training data, how could AI possibly be good at doctoring IRL?

1 more reply

mrbungie1mo ago

The user will be adversarial and probably learn new tricks to trick the machine, this is not solvable (only) via training data.

1 more reply

zbentley1mo ago

To give this more credit than it perhaps deserves: training aside, getting the situational data into the context is a more significant problem here.

Pt's chart is complex/wrong? Gotta ingest that into context.

Chart contains images/scanned and not OCR'd text? Gotta do an image recognition pass.

Diagnosis needs to know what the pt's wearing (i.e. radiation badge)? Gotta do an image recognition pass.

Diagnosis needs to know what the weather's like? Internet API access of some kind. Hope the WAN/API are all working! If they're not, do you fail open or closed?

DrewADesign1mo ago· 2 in thread

Code is pretty much the perfect use case for LLMs… text-based, very pattern-oriented, extremely limited complexity compared to biological systems, etc.

SirHumphrey1mo ago

And with the code, the closer you come to the physical world the worse LLMs fair.

Claude can’t really write Openscad and when I was debugging some map projections code last week it struggled a lot more than usual.

prplxd_nihilist1mo ago

Until anthropic hire or steal code from acquired companies and train with it.

1 more reply

j / k navigate · click thread line to collapse