undefined | Better HN

0 pointscubano4y ago0 comments

Huh?

What do you mean, not working? That the AI was randomly choosing the correct race 82% of the time by luck?

I'm confused by what your implying because it would seem to me that the authors went through many steps to try to pinpoint how the AI was doing this identification and how baffling it was to everyone that even with a lot of x-ray information removed (8x8 pixels compared to say 4k), it somehow was still correctly picking the race.

What would this "something else entirely" that you are implying actually be?

0 comments

11 comments · 3 top-level

ceejayoz4y ago· 6 in thread

> That the AI was randomly choosing the correct race 82% of the time by luck?

No; as with the article I linked elsewhere in the thread (https://techcrunch.com/2018/12/31/this-clever-ai-hid-data-fr...), that the AI might have found some other indicator, like filenames in the data set, or metadata in the images that included patient name, or differences in the length of patient name (often redacted by black rectangles in x-rays in training data), or any number of other factors.

This happens all the time in science. As another recent example of "whoops, turned out we were measuring the wrong thing", https://en.wikipedia.org/wiki/Faster-than-light_neutrino_ano...

Another example around AI: https://www.vox.com/recode/2019/12/12/20993665/artificial-in...

> One such résumé-screening tool identified being named Jared and having played lacrosse in high school as the best predictors of job performance, as Quartz reported.

Are lacrosse players naturally better workers? Probably not. Are they probably whiter, wealthier, better networks, etc. than the average population? Probably. These sorts of things - as with the 8x8 pixel example - start to point to confounding variables that need to be worked out and accounted for.

dexen4y ago

>the AI might have found some other indicator, like filenames in the data set

The paper quite explicitly goes into testing and disseminating what exactly the AI detects. Two observations:

- the classification clearly was primarily based on the visual content rather than spurious metadata, because various transformations of the visual content had the expected impact on classification correctness

- the classification clearly wasn't based on one specific feature of the visual content but rather on multiple factors in the visuals, because various transformations to features (including masking out specific features like bone density) produced results matching expectations (usually gradual decrease in accuracy, with some thresholds).

Conversely, if the classification was primarily based on factors other than the visual content, the visual transformations would have had negligible effect - possibly up to a threshold, and then would throw the AI completely off.

ceejayoz4y ago

The faster-than-light neutrino experiment similarly went "we've tried to account for everything we can think of and still can't figure it out" when they published. It turned out to be a measurement error.

The same may be true here, and I think it's the most likely explanation.

I'd be interested in whether the same model can be trained to predict patient wealth, hair color, style of clothing, religion, etc. from the same x-ray data sets.

2 more replies

vilhelm_s4y ago

In principle yes, but did you read the paper? They do a lot of completely crazy things like blurring the image until it's just fuzzy blobs, or doing a high-pass filter on it until it just looks like noise (they comment that a human could not even guess that it's an x-ray picture), and they still get very high accuracy. Basically no matter what they try they can still get the race out, with slightly lower percentage numbers. When reading it I also thought this is too good to be true, and they may have some kind of bug in their code...

2 more replies

shadowgovt4y ago

As an additional comment on this point:

The fact that trained neural networks cannot tell us why they give an answer and the best tool we have to explore that is to wiggle the inputs and see how the black box responds is a major concern for the whole space. Figuring out how to tag data with enough information to generate a "why" was an active area of research ten years ago and still is.

YeGoblynQueenne4y ago

Yep. "Explainable AI" is an active area of research with huge amounts of funding and interest from institutions in the US, EU and China. For example, this is the DARPA programme:

https://www.darpa.mil/program/explainable-artificial-intelli...

0-_-04y ago

Can you explain to me how you recognize your mother's voice?

1 more reply

CWuestefeld4y ago· 2 in thread

I'm just making this up, but...

Perhaps hospitals that treat a disproportionate share of poor people (which themselves are disproportionately not white), tend to use a different brand of X-ray film, and that brand has different contrast ratios than that of the brand preferred by rich hospitals. Thus, they'd be detecting the different brand of X-ray film rather than anything about the patients themselves.

Of course, at this level it's still hard to imagine generating that 82% hit rate. But maybe there are multiple factors along these lines.

lostlogin4y ago

> tend to use a different brand of X-ray film

Most of us radiology folk abandoned film 20 years ago and went to digital systems (CR or DR). This doesn’t negate your query though, as vendors do have different technologies and their images do not look the same.

kovek4y ago

That sounds like a great idea and they can test for it! Classify the scans on the “type of film” and then alter the scan and see if the model recognizes it

FeepingCreature4y ago

I think the idea is that it's picking up on a coincidental correlational bias in the source data.

j / k navigate · click thread line to collapse

0 comments

11 comments · 3 top-level

ceejayoz4y ago· 6 in thread

> That the AI was randomly choosing the correct race 82% of the time by luck?

This happens all the time in science. As another recent example of "whoops, turned out we were measuring the wrong thing", https://en.wikipedia.org/wiki/Faster-than-light_neutrino_ano...

Another example around AI: https://www.vox.com/recode/2019/12/12/20993665/artificial-in...

> One such résumé-screening tool identified being named Jared and having played lacrosse in high school as the best predictors of job performance, as Quartz reported.

dexen4y ago

>the AI might have found some other indicator, like filenames in the data set

The paper quite explicitly goes into testing and disseminating what exactly the AI detects. Two observations:

ceejayoz4y ago

The same may be true here, and I think it's the most likely explanation.

I'd be interested in whether the same model can be trained to predict patient wealth, hair color, style of clothing, religion, etc. from the same x-ray data sets.

2 more replies

vilhelm_s4y ago

2 more replies

shadowgovt4y ago

As an additional comment on this point:

YeGoblynQueenne4y ago

Yep. "Explainable AI" is an active area of research with huge amounts of funding and interest from institutions in the US, EU and China. For example, this is the DARPA programme:

https://www.darpa.mil/program/explainable-artificial-intelli...

0-_-04y ago

Can you explain to me how you recognize your mother's voice?

1 more reply

CWuestefeld4y ago· 2 in thread

I'm just making this up, but...

Of course, at this level it's still hard to imagine generating that 82% hit rate. But maybe there are multiple factors along these lines.

lostlogin4y ago

> tend to use a different brand of X-ray film

kovek4y ago

That sounds like a great idea and they can test for it! Classify the scans on the “type of film” and then alter the scan and see if the model recognizes it

FeepingCreature4y ago

I think the idea is that it's picking up on a coincidental correlational bias in the source data.

j / k navigate · click thread line to collapse