>Having dual nationality was marked as a big risk indicator, as was a low income.
Again, even if these cases were all impartially investigated, you're still explicitly targeting low-income people and dual-nationals, probably significantly more likely to be an ethnic minority. I think the algorithm is a worthwhile story, not just the trigger-happy authorities.
Is it? This is the big question, as far as I can see. If my country has green people and blue people, and green people are overwhelmingly more likely to commit fraud, is it wrong to require additional fraud checks on the basis of being green?
Note that this isn't a conviction. It's accepted that the standard is not so exacting that no one innocent must be subject to investigation - that would be impossible under realistic circumstances. (Hell, even convictions don't meet that standard.)
On the other hand, I do accept the argument that you would have to be very, very cautious that you don't end up with a feedback loop if you did this - in other words, a system where you keep convicting green people because you investigate them, and the data therefore keep suggesting that green people have a greater propensity for crime, and so forth. That's undoubtedly happening in many countries, I'm sure.
I'll stipulate that in this scenario the data themselves are not actually biased. Whilst in reality these data often are biased by things like disproportionate policing and heavy punishhment.
Even then, discriminating by government based on race is bad even when a statistical basis for such discrimination exists. What makes "race" problematic to discriminate on is how easy it is to see "race". Or rather, how easily most people classify and distinguish between ethnicities based on how they look.
For an example, lets start with a small difference between blue and green people. blue people are twice as likely to commit crime as green people, with a criminality rate of 0.2% vs 0.1%. If this starts being how you police, if this 2x difference starts guiding decisions, then a lot of innocent people start being disadvantaged. The extra problem is that it is very easy to see if someone is blue or green. So it becomes really easy for a lot of people to start acting based on this 2x difference. This harms all blue people which is disproportionate. It then becomes a lot easier to get to the feedback loop you talked about.
And it made me think: I suspect lots of our ML/NN models function like that. They pick race, or they pick a proxy for race. In situations where the 'ground truth' metric genuinely is racially skewed, it can be hard to tell, and it's just not realistic to demand that people make their models inaccurate for the sake of racial equity.
But it highlights, for me, the unavoidable danger of black-box models. I don't mean some logistic regression or decision tree, because those - while not literally explaining themselves - can be figured out if you have some domain knowledge of the parameters. But the overfitting machines that we call neural nets, well, I suspect this is happening everywhere, at a cost in both equity and also accuracy/reliability. (The probably-apocryphal story of the computer vision model for estimating density of people in a train station, but which ended up just looking at the clock on the wall, comes to mind.)
[0] I remember it exactly because it also would have captured me, incidentally, with my plummy double-barrelled surname - though that's beside the point here.