Machine Bias (opens in new tab)

(propublica.org)

97 pointsr0h1n10y ago57 comments

57 comments

28 comments · 8 top-level

Malarkey7310y ago· 7 in thread

One of the most mind boggling sentences in that article was:

"On Sunday, Northpointe gave ProPublica the basics of its future-crime formula — which includes factors such as education levels, and whether a defendant has a job. It did not share the specific calculations, which it said are proprietary."

How on earth can you lock people up based on secret information? That is Kafka meets Minority Report.

yummyfajitas10y ago

This is done regularly. It's called "judicial discretion" - a judge uses a neural network so secret that even he doesn't understand it (in fact the entire scientific field of "neuroscience" exists to try and analyze it).

Variables used in the formula include details of the case, race/appearance of the defendant, and how recently lunch was at the time of sentencing. Unlike the ProPublica claims of racial bias (which are merely "almost statistically significant" at the p=0.05 level), the lunch bias is statistically significant at the p < 0.01 level.

http://www.pnas.org/content/108/17/6889.full

This system sounds like a huge improvement.

mattkrause10y ago

Just FYI: The lunch paper has very serious problems as described in this reply, also published in PNAS: http://www.pnas.org/content/108/42/E833.full)

In particular, the cases are heard in a particular order. For each prison, the prisoners with counsel go before those who are representing themselves. As in the US, those representing themselves typically fair worse. The judges try to finish an entire prison's worth of hearings before a meal, so the least-likely-to-succeed cases are typically assigned to spots right before a break.

There are some other bits of weirdness in the original data too. They found a statistically significant association between the ordinal position (e.g., 1st, 2nd, ..., last) and the parole board's decision, but failed to find any effect of actual time elapsed (e.g., in minutes), even though the latter is much more compatible with a physiological hypothesis like running out of glucose.

1 more reply

kenjackson10y ago

As you note, the algorithm for judicial discretion is unknown. The algorithm for this software is fully known, just kept from the public.

1 more reply

cwilkes10y ago

What if it came out of a neural net or some other system that can't be easily explained? There's no real "specific calculation" to show.

Now if they were using decision trees, i.e. If the person has 3 or more felonies they get a 5 rating, that could be presented.

I'm curious about how much of a feedback loop this process has. The model was probably trained on old data and never updated. Also how does it take into account features that it doesn't know about (the article mentions one guy turning to Christianity)? I doubt if there is a mechanism for people to be asked why they did or did not reoffend. Even if they did how much should it be trusted?

SixSigma10y ago

Neural nets may be opaque but they are not secret.

Malarkey7310y ago

I have this very concern about using SVM in medical research.

I also worry greatly about diagnostic predictive models that maximise overall prediction success but don't balance the relative consequences of false positives and false negatives.

carapace10y ago

I'm leaning towards a Constitutional Amendment against automated law. The wording escapes me (and I'm unqualified anyhow) but the gist would be that only humans can judge humans, no machinery can be allowed to do it.

https://en.wikipedia.org/wiki/Butlerian_Jihad

yummyfajitas10y ago· 4 in thread

According to propublicas own analysis, the claim of bias cannot be shown to be statistically significant. https://www.propublica.org/article/how-we-analyzed-the-compa...

This article is terrible data journalism and probably deliberately misleading.

Step 1: write down conclusion.

Step 2: do analysis.

Step 3: if analysis doesn't support conclusion, write down a bunch of anecdotes.

Really, here's her R script: https://github.com/propublica/compas-analysis/blob/master/Co...

Just read that. It's vastly better than this nonsensical article.

daveguy10y ago

They analyzed what they could -- the outcomes of the algorithm (recommendation) and the accuracy of those recommendations. They picked out specific examples, but the analysis was over the whole data set. I think you missed these relevant parts from the article:

> We obtained the risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 and checked to see how many were charged with new crimes over the next two years, the same benchmark used by the creators of the algorithm.

> The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so.

> The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. White defendants were mislabeled as low risk more often than black defendants.

> Could this disparity be explained by defendants’ prior crimes or the type of crimes they were arrested for? No. We ran a statistical test that isolated the effect of race from criminal history and recidivism, as well as from defendants’ age and gender.

> Black defendants were still 77 percent more likely to be pegged as at higher risk of committing a future violent crime and 45 percent more likely to be predicted to commit a future crime of any kind.

yummyfajitas10y ago

Go read the description of the statistical analysis or just view their R notebook:

https://github.com/propublica/compas-analysis/blob/master/Co...

Their own analysis shows that (p ~= 0) that high and medium risk factors are predictive. They also showed that the racial bias terms (race_factorAfrican-American:score_factorHigh, etc) are probably not predictive (p > 0.05).

Your quotes are not evidence of bias, though I see how they might confuse an innumerate reader. It's interesting how good a job this article is doing confusing the innumerate - it's almost as if it was written to mislead without technically lying.

For example, black defendants being pegged as being more likely to commit crimes can be caused by one of two things: bias or perhaps black defends actually are more likely to commit crimes. According to ProPublica's own analysis (see race_factorAfrican-American), the latter is actually the case. This is true with p = 4.52e-06 - see line [36].

2 more replies

kenjackson10y ago

(From my above reply too, as it applies here also):

Lets be clear -- if the null hypothesis in this case is true (that there is no bias), and all other assumptions made are true, there is a slightly greater than 5.7% chance of obtaining this result (or something even more skewed). That's a great bar for publication of SCIENCE. It's not a great bar for hiding behind a proprietary algorithm used in sentencing. People talk about misuse of p-values, but this takes the cake.

yummyfajitas10y ago

If you want to criticize the details of her analysis, go ahead. I'm solidly in the Bayesian camp and I agree with you 100%. What I'd have done is computed posteriors on all these coefficients and then computed bayes factors/probability of bias.

I'm confused though; the mood affiliation of your post somehow suggests that her less than perfect choice of a statistical methodology somehow supports her claims. Could you explain that? Or am I simply misunderstanding what you are trying to say?

Also, lets suppose we just take her own analysis at face value, and don't view it through the p-value lens. The maximum likelihood estimate suggests that even if this effect is not random chance, it's not very big. I.e., the "score factor high" estimate is >8x larger than the "score factor high, race = black" estimate. Isn't this really good? Do you really think the human biases that this algorithm mitigates are lower than this?

Lastly, what specific analysis would convince you that this algorithm is predictive and non-biased (or more realistically, not very biased)?

1 more reply

kough10y ago· 3 in thread

This is totally fucked. Morally wrong, deeply unethical, and probably illegal – if you're adding punishment without having that additional punishment based on new evidence, isn't that like being treated guilty without proof? Obviously I'm not a lawyer, but how could anyone, let alone the whole huge set of people that led to these policies, think that applying group statistics to individuals to determine the severity of their punishment is ok?

On the other hand, these biaces (most notably the racial ones) exist in the process anyway, and now they're simply being codified and exposed. If these algorithms were published we could see exactly how much more punishment you get for being black in America versus being white.

Thanks again to ProPublica for an important piece of reporting; hopefully changes get made for the better.

mikeash10y ago

Punishment is always considered somewhat separately from the determination of guilt. The judge would already try to account for things like this when determining your sentence. They just do it in a deeply ad hoc and personal manner, where they just take a stab at it, try to account for things like how sorry you seem to be, apply guidelines, and come up with a number. This means that you might ultimately be punished for the judge not having a good breakfast:

http://www.scientificamerican.com/article/lunchtime-leniency...

And of course it goes without saying that judges will be affected by their biases, racial and otherwise.

I'm not sure what to do about it, though. Handing down the exact same punishment for every single person who commits a particular crime seems too blind. But any variation is going to be problematic.

pessimizer10y ago

All this does is systematize those biases so that they can't be challenged like a judge with a record of bias can. The statistics that they choose to record create bias in and of themselves - by using race in the algorithm, you are building in the possibility that race influences criminality. If you built in favorite foods, some foods would end up resulting in higher sentences than others, just as if you built in phases of the moon when the crime was committed or the astrological sign of the victim.

Where there was absolutely no effect, one out of every twenty combinations of all other variables would show significance in combination with the current value of that particular variable in the likelihood of future crime.

Furthermore, the algorithm would simply extend existing biases in arrest and sentencing, because it simply can't account for crimes that are uncaught and unpunished. Groups that are stopped, searched, arrested, and convicted at greater rates would without fail be sentenced to more time. Just another benefit of being white in America.

You end up using the fact that some groups are punished more often to justify punishing them more harshly.

Even worse, I bet that the fact that it thinks that women are at a higher risk for recidivism means that somewhere within the algorithm it's using the fact that women in general are less criminal than men to decide that women who do commit crime are more exceptional (within women), and therefore more deviant. It's disgusting. If you can't legally discriminate against a person on particular grounds, you certainly can't feed those grounds into an algorithm to let it discriminate for you while you shrug and feign innocence.

The algorithm is the innocent one - it's just attempting to reflect the system as it is. It's like an algorithm you would write to predict the winners of horse races, or the sports book. And just like one of those algorithms, if you stuff it with garbage (the kind of garbage that makes it wrong 77% of the time), it will result in garbage. If you use the results for something not external to the system, bad variables will feed back into themselves and make the results progressively worse - what's the effect of a longer sentence on recidivism? How does profitable is the arbitrage on your sports book algorithm if people use the results to bet, and the distribution of bets shift the odds?

1 more reply

daveguy10y ago

That does sound like a good argument against it -- adding punishment without evidence... Could they argue that they're reducing sentences for those less likely to repeat? If they don't see "evidence" that the person will repeat then they give a reduced sentence (kindof like early parole). Still unethical crap because it pushes a race-based agenda (consciously or unconsciously). I'd say there's no difference, and would agreewith your argument. Also, not a lawyer. Technically they don't ask "are you black". They ask whether or not you had a parent incarcerated -- good for propagating a broken status quo. That question almost seems designed to "increase punishment without evidence". Regardless there shouldn't be any private algorithm deciding this and any public algorithm should be well scrutinized and validated for accuracy.

One thing is certain -- the federal government needs to shut these sentencing analysis companies down. At the very least heavy public audits. I'd say even libertarians would agree this is the definition of something that should be regulated.

pdkl9510y ago· 3 in thread

Weapons of Math Destruction

http://boingboing.net/2016/01/06/weapons-of-math-destruction...

It's easy to hide agenda behind an algorithm; especially when the details of the algorithm are not publicly visible.

yummyfajitas10y ago

It's far easier to hide an agenda behind verbiage and anecdotes. Go read the author's actual statistical analysis:

https://github.com/propublica/compas-analysis/blob/master/Co...

In the statistical analysis (unlike the verbiage) she is completely unable to hide the lack of bias and the accuracy of the algorithm, all of which are clearly on display in line [36]. In contrast, her verbiage somehow conveys the exact opposite impression.

zyxley10y ago

Uh... it's all right there in your link, across several sections that analyze specific parts of the data.

> Black defendants are 45% more likely than white defendants to receive a higher score correcting for the seriousness of their crime, previous arrests, and future criminal behavior.

> Women are 19.4% more likely than men to get a higher score.

> Most surprisingly, people under 25 are 2.5 times as likely to get a higher score as middle aged defendants.

> The violent score overpredicts recidivism for black defendants by 77.3% compared to white defendants.

> Defendands under 25 are 7.4 times as likely to get a higher score as middle aged defendants.

> [U]nder COMPAS black defendants are 91% more likely to get a higher score and not go on to commit more crimes than white defendants after two year.

> COMPAS scores misclassify white reoffenders as low risk at 70.4% more often than black reoffenders.

> Black defendants are twice as likely to be false positives for a Higher violent score than white defendants.

> White defendants are 63% more likely to get a lower score and commit another crime than Black defendants.

Calling out one specific section that doesn't show bias doesn't magically exonerate the rest.

1 more reply

pc8610y ago

The data analysis you link to is by Jeff Larson, while the primary author of the article is Julia Angwin.

Larson is still the second author so it is certainly a big question how he can present data showing no statistical correlation between race and score then have his name on an article saying the exact opposite that is clearly pushing an agenda. And as noted, one where the owners of the publication are also involved in a competing risk assessment product.

1 more reply

thejefflarson10y ago· 2 in thread

Thanks for posting this. I encourage this crowd to to take a look at the methodology too: https://www.propublica.org/article/how-we-analyzed-the-compa...

gleb10y ago

Are you sure what you found is not just Simpson's paradox?

When I look at the 2 KM plots for white/blacks, they are mostly the same. It's pretty clear that the model is not prejudiced against blacks, in fact it's somewhat prejudiced against whites. [1]

Your main editorial claim is that whites tend to be misclassified as "good" and blacks as "bad."

But I think what's actually happening is that algorithm is more likely to misclassify low_risk as "good", and high_risk as "bad".[2] Combine that with vastly more whites than blacks being low_risk (as you show earlier) and you get the observed "injustice".

I'll also note that the KM for whites flatten out at 2 years, unlike for blacks. This is actually a big deal if statistically significant. But that's a separate conversation.

Footnotes:

1 - this is acknowledged in methodology page "black defendants who scored higher did recidivate slightly more often than white defendants (63 percent vs. 59 percent)."

2 - why that is I don't yet fully understand (and I'd like to) but it looks's to be simple math that follows from low risk mostly not recidivating, and high risk mostly yes recidivating

gleb10y ago

Thanks for posting a link to the methodology.

Does this sentence "Northpointe does offer a custom test for women, but it is not in use in Broward County. " imply that the base COMPAS model does not take gender into account?

wyager10y ago· 1 in thread

I don't have an issue with using statistical analysis to direct crime prevention efforts. I think it's unconscionable to use statistical analysis for sentencing. We don't want Minority Report in real life.

michaelbuddy10y ago

I think the problem is, the amount of crime is causing the legal system to buckle. So there is a search for solutions to make the process more efficient. There may be some value to sentencing standards that may serve as a deterrent.

In other words, if you are a repeat offender, some cases you think you know what your lawyer can do for you. But a system replacing that, one that is overly harsh may deter you. All things being equal in a system of punishment, I think I want the one that's got some deterrence in it. So this is worth exploring.

If every criminal knew that getting caught meant being put into a meat grinder of sorts, I wonder how that would change their thinking about how to navigate the world and problem solve.

Dowwie10y ago

Consistent with the theme of this story is the content from discussions held at a conference at NYU School of Law, featuring human rights and legal scholars. Coincidentally, I submitted a link on this yesterday

See https://news.ycombinator.com/item?id=11753089

thisisdave10y ago

In block [37] of the ipython notebook, are racial main effects missing? I only see interactions.

https://github.com/propublica/compas-analysis/blob/master/Co...

j / k navigate · click thread line to collapse