undefined | Better HN

0 pointsyorwba8y ago0 comments

You got me there. After I did the calculation I got too lazy to explain it in detail, so I hoped dumping it would be enough.

The variance ratio was calculated in the paper as (variance of the boy's scores)/(variance of the girl's scores), and the most natural way of computing the effect of a threshold is (men above threshold)/(women above threshold). But we are usually more interested in the relation of individuals to the overall population. So I needed a helper function to change the male/female ratio into a male/(male + female) ratio.

  gender_ratio_to_population_ratio = lambda gr: gr/(1+gr)

However, when you directly apply this function to the variance ratio, you get the part of the overall variance that is caused by the men. For the distribution, I actually wanted the standard deviation in multiples of the average deviation. Luckily, male/(average person) = male/((male+female)/2) = 2 * male/(male+female). Take the square root to get the standard deviation from the variance.

  from numpy import sqrt
  variance_ratio_to_standard_deviation = lambda vr: sqrt( 2*gender_ratio_to_population_ratio(vr) )

As mentioned above, (men above threshold)/(women above threshold) gets you the male/female ratio of people above the threshold. The cumulative density function gives you the probability someone is below the threshold, but because the normal distribution is symmetric, you can just flip the sign.

  from scipy.stats import norm
  ratio_above_threshold = lambda thresh, vr: norm.cdf(-thresh, scale=variance_ratio_to_standard_deviation(vr))

Because I'm expressing the variances of both sub-populations in relation to the total population, the variance for men comes from the male/female variance ratio 1.15, while the variance for women comes from the female/male variance ratio 1/1.15. If you use sqrt(1.15) and sqrt(1.0) as standard deviations instead, your threshold is in units of female standard deviation, which is slightly lower than that of the overall population.

  male_female_ratio_above_threshold = lambda thresh: ratio_above_threshold(thresh, 1.15) / ratio_above_threshold(thresh, 1/1.15)

Once you have the relative numbers (men per woman), you can turn them into population percentages.

  male_percentage_above_threshold = lambda thresh: gender_ratio_to_population_ratio(male_female_ratio_above_threshold(thresh))

I think that should have answered some of your questions, but not all of them.

> Relatively speaking, the threshold isn't as far above the average programmer as it is above the average person, but the 80/20 split depends on the absolute threshold, yes?

I actually hadn't thought to put it in relation to the population of programmers, but what you are saying makes sense (with the caveat that the standard deviation of programmers might differ from the general population). However, it seems that the 80/20 split isn't just at Google, but also close to the industry average and to enrollment in CS majors, which definitely don't just include the top programmers.

> And we'd better qualify that this is assuming 1.15 is the right number globally,

Actually, 1.15 isn't the right number globally: in different countries, the variance ratio was as low as 0.9 and as high as 1.5, so it isn't actually very stable. In the US, you have the choice between 1.19, 1.11 and 1.08, depending on the test and the year of testing. 1.15 is just the average over all countries and tests.

> and that 1.15 is valid for subjects other than math,

It probably isn't, but if you have no numbers, you just take what seems closest and run with it. Another caveat is that this number was for elementary/middle school students, and it could be different for adults, either higher, because the students were not fully developed yet, or lower, because the students were in different stages of development.

> and that using only the higher variance means that percent of women in tech ratio is 100% dependent on IQ, which in turn means that hiring and job performance and ability all correlate 100% with IQ and nothing else,

If there were some other normally distributed property, e.g. "programming ability", you could use that to make a similar argument, but it would have to fulfill these requirements to be the only explanation. Modeling the hiring process as a binary threshold is also quite simplistic, but I don't think I could compute it for anything more realistic.

> and finally that IQ itself is free of social biases.

You only have to assume that if you want to make IQ (or something else) the only explanation and claim that social bias is not involved. It would absolve Google of immediate responsibility if they simply administered an IQ test to candidates, but there would still be ways bias can influence the outcome.

> Not to mention that if any other factors are involved (and we know there are) then the IQ threshold is even higher.

Don't you mean lower? There are a whole bunch of other factors that make women less likely to go into CS, stay as programmers, apply to Google (like social bias) that would lower the remaining difference the "higher variance" hypothesis would have to explain. Of course that would leave it with little overall influence, but it might be the case that all individual factors are only able to explain a small part, and it's their interaction that causes the huge difference.

0 comments

1 comments · 1 top-level

dahart8y ago

Thank you, thank you, for taking the time to explain!! This makes sense to me, and this analysis is fantastic.

I see more clearly the issue with 1/1.15 that I was worried about, and what I missed was the gener_ratio_to_population_ratio inside the variance_ratio_to_standard_deviation. I was worried the variance ratio was being double counted, but I see now that it's not, you just made it symmetric.

> It would absolve Google of immediate responsibility if they simply administered an IQ test to candidates

Isn't there a very high probability that this would implicate Google rather than absolve them? I'd be really pretty extremely surprised if Google had managed to hire tens of thousands of people that are all at least 165 IQ. If they prove the average IQ is 130, they then potentially have to take responsibility for the remainder of the discrepancy.

> Don't you mean lower? There are a whole bunch of other factors that make women less likely to go into CS

I did mean higher, but you're absolutely right to call BS on that. Any factors that alone would result in a lower female ratio than 20/80 would push the IQ threshold lower. Factors that alone would result in anything higher than 20/80 would raise the IQ threshold. My assumption was that, being at a very extreme end of the spectrum more than 4(!) standard deviations from the general population, almost all other factors would be closer to 50/50 than to 0/100, and would thus raise the IQ threshold. But that is my assumption and belief, not any established fact.

Since the actual ratio is 20/80, and I believe that IQ is at best a small factor, then for my hypothesis to be right, some other actual factor must be closer to 0/100 than 50/50. That means I'd better accept your suggestion that what I really meant to say is other factors would push the IQ threshold lower not higher, because there's evidence for that. You've done me a favor. ;)

Thanks again for engaging at this level.

j / k navigate · click thread line to collapse