You are saying that algorithms are accurately measuring a reality you wish were different. I don't disagree with this.
The right thing to do is to actually answer unpleasant moral questions like "if blacks are 4x more likely to be dangerous criminals, what should we do about it?" But I guess overloading the word "bias" is a nice substitute for clearly thinking things through.
(1) Defining the proper goals, and
(2) Measuring the right things (such as the real goals of interest rather than biased proxies.)
With police deployments, you are assuming the solution (rather than letting your algorithm optimize it) by saying "I want to put more police where more arrests occur". What you really want is probably something more like (the exact goal may be different, of course) "I want to deploy police resources where it will most effectively reduce the incidence of crime, weighted by some assigned measure of severity." Then let your ML algorithm crunch the various measurable factors and produce an optimum deployment to do that.
(But, then again with that goal -- and similar problems exist with many likely real goals -- you run into the other problem, which is measuring the incidence of crime -- measuring crime reports may be the obvious approach, but there's plenty of evidence that lots of factors can bias crime reports, including communities having bad experience with police being less likely to report crimes.)
As you say, proper goals and measurement can fix a lot of these problems, and I don't think it's obvious that ml algorithms solve either of those
Otherwise one would have to make assumptions about (or in other words, model) the corruption process. The bias compensation machinery then has to be deliberate, wont happen on its own.
Some sampling processes do not modify the conditional. In those cases no special machinery would be required.
But if your model is sufficiently expressive you don't need to explicitly build or model the corruption process. In the example in my linked blog post, test scores might be biased against blacks. But race is also redundantly encoded, so the algorithm has enough information to fix the bias completely by accident.
Fundamentally what I'm saying here is that bias is a statistics problem and has a statistics solution. Insofar as your complaint is algorithms finding the wrong answer, the solution is better stats.
And nothing whatsoever that I've said here would be remotely controversial if the topic were remote sensing.
This is the claim that I am having trouble with.
Say I have two random variable X,Y with some joint distribution. If a corruption process can mess with the samples drawn from it, I cannot see how it could possibly recover either the joint or the conditional.
Are you saying that the corruption is benign like missing at random or missing completely at random ? Then its much more believable.