If the average difference between all men and all women on Skill X is 3%, and the average difference between any two given individuals is 30%, then even though there are two distinct loci, gender is still an extremely weak indicator. Sure, it shows up on statistical charts, but for that to be a meaningful "larger picture" means it should be able to inform our actions as a society. But the huge intra-group variation means that any action we do take based on this data is necessarily unfair to a large number of individuals.
> (p.s. the data says you're wrong.)
Please, I love to see it. I would have researched it myself except didn't have time to dig around right now and a quick Google search turns up nothing on the subject.