undefined | Better HN

0 pointscool_dude854y ago0 comments

I only have as much information as is included in the original article, but this seems like a big issue to me:

>Having dual nationality was marked as a big risk indicator, as was a low income.

Again, even if these cases were all impartially investigated, you're still explicitly targeting low-income people and dual-nationals, probably significantly more likely to be an ethnic minority. I think the algorithm is a worthwhile story, not just the trigger-happy authorities.

0 comments

rocqua4y ago

That is indeed a big problem. I wouldn't be surprised if this marking as a risk factor wasn't even backed by data. (Note that even if it was backed by data, it would still be wrong).

samhw4y ago

> Note that even if it was backed by data, it would still be wrong.

Is it? This is the big question, as far as I can see. If my country has green people and blue people, and green people are overwhelmingly more likely to commit fraud, is it wrong to require additional fraud checks on the basis of being green?

Note that this isn't a conviction. It's accepted that the standard is not so exacting that no one innocent must be subject to investigation - that would be impossible under realistic circumstances. (Hell, even convictions don't meet that standard.)

On the other hand, I do accept the argument that you would have to be very, very cautious that you don't end up with a feedback loop if you did this - in other words, a system where you keep convicting green people because you investigate them, and the data therefore keep suggesting that green people have a greater propensity for crime, and so forth. That's undoubtedly happening in many countries, I'm sure.

rocqua4y ago

This is getting off-topic. But it still is.

I'll stipulate that in this scenario the data themselves are not actually biased. Whilst in reality these data often are biased by things like disproportionate policing and heavy punishhment.

Even then, discriminating by government based on race is bad even when a statistical basis for such discrimination exists. What makes "race" problematic to discriminate on is how easy it is to see "race". Or rather, how easily most people classify and distinguish between ethnicities based on how they look.

For an example, lets start with a small difference between blue and green people. blue people are twice as likely to commit crime as green people, with a criminality rate of 0.2% vs 0.1%. If this starts being how you police, if this 2x difference starts guiding decisions, then a lot of innocent people start being disadvantaged. The extra problem is that it is very easy to see if someone is blue or green. So it becomes really easy for a lot of people to start acting based on this 2x difference. This harms all blue people which is disproportionate. It then becomes a lot easier to get to the feedback loop you talked about.

samhw4y ago

Yeah, I agree. That's fair. I've thought about this before - I worked at a bank a couple of years ago, and our CCO (Chief Credit Officer in this instance) wanted to implement a rule, in our credit decisioning models, to decline people whose surnames had 5 or more vowels. It was a naked (and admitted) proxy for Africans, and probably some other 'ethnic' people too[0].

And it made me think: I suspect lots of our ML/NN models function like that. They pick race, or they pick a proxy for race. In situations where the 'ground truth' metric genuinely is racially skewed, it can be hard to tell, and it's just not realistic to demand that people make their models inaccurate for the sake of racial equity.

But it highlights, for me, the unavoidable danger of black-box models. I don't mean some logistic regression or decision tree, because those - while not literally explaining themselves - can be figured out if you have some domain knowledge of the parameters. But the overfitting machines that we call neural nets, well, I suspect this is happening everywhere, at a cost in both equity and also accuracy/reliability. (The probably-apocryphal story of the computer vision model for estimating density of people in a train station, but which ended up just looking at the clock on the wall, comes to mind.)

[0] I remember it exactly because it also would have captured me, incidentally, with my plummy double-barrelled surname - though that's beside the point here.

j / k navigate · click thread line to collapse

0 comments

rocqua4y ago

That is indeed a big problem. I wouldn't be surprised if this marking as a risk factor wasn't even backed by data. (Note that even if it was backed by data, it would still be wrong).

samhw4y ago

> Note that even if it was backed by data, it would still be wrong.

rocqua4y ago

This is getting off-topic. But it still is.

I'll stipulate that in this scenario the data themselves are not actually biased. Whilst in reality these data often are biased by things like disproportionate policing and heavy punishhment.

samhw4y ago

[0] I remember it exactly because it also would have captured me, incidentally, with my plummy double-barrelled surname - though that's beside the point here.

j / k navigate · click thread line to collapse