This is statistically illiterate. Let's say you have two sets of random variables X and Y, both from Normal distributions with standard deviation 2, where the mean of X is 0 and the mean of Y is 1. Knowing whether a measurement comes from X or Y will still allow you to make more accurate predictions, even though the within-group variance is larger than the between-group variance. For very large groups, this applies much more so. If you have high-dimensional multivariate data, it is possible to assign individuals to clusters very accurately even if all individual measurements overlap substantially.
See for example http://www.ncbi.nlm.nih.gov/pubmed/12879450
> Not to mention morally dubious, at best.
This is the real issue: the use of certain stereotypes is a political and ethical debate, and those against using stereotypes should stop pretending there are no costs in terms of sub-optimal decision making. There are defensible ethical reasons for being against some types of discrimination based on valid stereotypes.
No comments yet.