undefined | Better HN

0 pointsmatthewmacleod6y ago0 comments

Note that an ML box doesn’t even need gender as an input. Name alone probably gets you 90% of the way there.

0 comments

3 comments · 2 top-level

xtacy6y ago· 1 in thread

That points to the core of the issue. "Fairness" in ML algorithms can be hard to define and assess.

It's easy to say "omit gender from the model", but the real issue here has to do with the _causal_ pathways between your input variables and the output variable.

Since ML mostly works by exploiting correlations between the input and output variables, omitting gender doesn't mean gender's influence is removed. You'll have to omit all the causal pathways from gender -> the output, effectively "d-separating" [1] gender from the output. Whether that's practical or not depends on how well we understand the data generating process.

[1] http://bayes.cs.ucla.edu/BOOK-2K/d-sep.html

rightbyte6y ago

You can simulate it with artificial querries with the same data except gender and name and see if women get less or more.

jacquesm6y ago

True, you could have an internal table that maps name to gender with a very high probability of being right.

j / k navigate · click thread line to collapse