If members of my nation get drunk more often than some other, while it's offensive to say I am a 34% drunkard, on average it might hold; instead of forbidding this type of inference I'd rather rely on more signals to figure out what kind of person I am specifically for individualized decisions. They bypass this problem by adding "risky behavior" not contained in the input dataset so they just decide to model it as a hidden variable of Bayesian inference, where "risky behavior" might be correlated with ethnicity and red car anyway, just not visible outside. So if my nation is 34% drunkard but neighboring is only 11%, the conditional probability will likely be higher for my nation anyway, but obfuscated by the use of Bayesian hidden state. I am not sure why would that improve fairness.
It would only paralyze those who paid attention to the Toronto Declaration. You’re right because you can’t make ML fair because the universe isn’t fair, that’s a property of human judgements about facts. The facts remain the same regardless of ones feelings.
https://www.chrisstucchio.com/pubs/slides/crunchconf_2018/sl...
AI Ethics, Impossibility Theorems and Tradeoffs
This is the crux of the issue and as always, most people seem to miss it. Often “fair” is used as shorthand for “does what I think is right”.
Isn't this just a misleading way to say "holding a certain causal belief"? Why exactly would that be a bad thing? If you reject one set of causal beliefs, you necessarily hold a different set.
It's a much broader problem than that, because the direction of causation can be extraordinarily difficult to establish in general.
Changing the color of your car shouldn't change your ethnicity, but what if it does? Suppose you're white with Spanish ancestry and Hispanics are the group who like red cars. Paint your car red and some red-car-preferring Hispanics may be more inclined to associate with you and thereby cause you to be more immersed in Hispanic culture and start to identify as Hispanic rather than white.
And that's a silly one just to show that even the exemplar could be wrong. More plausibly, what if the causation between "risky behavior" and "red car" is reversed? We know that colors can affect human behavior. If getting into a red car makes you drive more aggressively then you have a direct causal chain between being more likely to buy a red car (for any reason) and being more likely to drive aggressively and get into a car crash.
That means that in order to use this you would first need to prove the direction of causation between the two behaviors. But that's a tall hill to climb when one of the factors you're trying to prove causation with is the one you don't have good data on.
There is also a straight forward way to tell when a method like this is definitely getting the math wrong -- does it make the prediction rate for that class of people worse? If your assumptions are correct then it shouldn't, so if it does then you've unambiguously failed.
And every time I express my desire for autobahns without speed restrictions to crisscross North America, whoever I'm talking to has generally been quick to inform me that Germans can have nice things like that because they are careful/skilled drivers, while Americans are reckless (wreckful) drivers and cannot be trusted at high speeds.
Now you may argue that correlation reflects causation in a particular case, sure, but in general, it is not the same, so it seems perfectly logical to me to point out that you can start building your model with certain causal assumptions and without others, without in any way disregarding your statistics.
Because, as people give lip service to constantly, but never seem to really adhere to, correlation is different from causation.
Making a system fair at the very least requires people designing the system to be fair. It's pretty clear that still does not happen, so I'm pretty skeptical of those that claim it's just around the corner.
It used to be considered fair to let people smoke when they wanted. Then it was considered fair to have smoking sections and non—smoking sections in restaurants. Now it’s considered fair to ban smoking entirely in restaurants and most public places.
You have a choice of whether or not you believe being male causes car insurance claims. That is independent of the statistical correlations. Ten times a day people say correlation is not causation, but a hundred times a day, I see people implicitly insisting that it necessarily is.
If I'm running an insurance agency and not a public policy advocacy, and my data keeps showing that men have higher accident rate than women, I can just ignore causation and build my actuarial tables based on that. I don't need a casual model here, at least not until the point I'd want to optimize my models further still, but there are diminishing returns on that.
The reason fairness has so much headway among engineers isn't just an aversion towards discrimination among educated people. It's that we all know this stuff is way jankier than we care to ever admit, and that we'd never want to be the data sausage going through the algorithm grinder.