undefined | Better HN

0 pointsgoatlover1y ago0 comments

I don’t understand the racial part, since there are plenty of trailer parks and poor rural areas. Zipcodes in the Appalachias for example.

0 comments

8 comments · 4 top-level

sandworm1011y ago· 4 in thread

Not accounting for race comes up in scary ways. I was part of a program that used a totally neutral database (race/gender were not in the database). People were selected by criteria and then emailed asking them to attend an introduction meeting online. Only when the webcams were turned on did we realize nearly every volunteer was none-white female. It was a very bad look. It seemed that we had selected them based on race/gender when in reality that data wasnt availible until the first video call. By ignoring race/gender we had somehow made it the most obvious selector.

(The program involved having children who were in regular contact with the criminal justice system.)

_heimdall1y ago

Did it turn out that the selection process actually didn't represent the makeup of the target audience?

If the participants did represent a subset of the target audience, I don't really see what the problem is if that audience happens to be heavily weighted towards a particular race, sex, etc. It seems like you'd be doing a disservice to the program to purposely control for those factors and end up with a population that physically looks more diverse at the cost of missing people who actually most need the program.

pessimizer1y ago

A lot of people have gotten into a weird place where they think that acknowledging that the descendants of slaves in the US are in a dire situation is a form of racism. Acknowledging that being injured has caused an injury has become either extreme right-wing bigotry (if you're a liberal who demands that every subset of people be a racially representative mix), or "the soft bigotry of low expectations" (if you're a conservative who can't admit to yourself that you inherited hundreds of thousands of dollars from a parent who also paid for your private school, car, and rent, then found you your first job.)

1 more reply

godelski1y ago

It's not racist show, point out, or claim that data racially skews in one direction. If that were true, then you couldn't even claim that minorities are under privileged. Right? Then how could you help them if you aren't able to recognize the areas that the biggest challenges? You're right when interpreting this way.

But the thing you do care when you want to attribute causality. In part this is an issue because people naturally associate correlation with causation (there is good reason but that's a long discussion. See Judea Pearl's The Book of Why). At the end of the day, we really are always after causal relationships, because we want to do things with the data (somewhere along the chain). So it's not that you want to remove race from data, but rather that you want to be wary and ensure that your variable is not confounding the real issue. Though this happens outside of race too.

And note that at times there where race does play a causal role. (I suspect not likely in the parent's case) For example, different races may be more prone to certain illnesses or genetic disorders.

If it helps, maybe it is easier to frame it as it's easy to be lazy, but the pressure around race makes us more likely to revisit our analysis and look for confounding variables. The thing is, this will improve your stats even for the non-minority settings because the truth of what you're (hopefully) doing, is just making better models.

1 more reply

lapphi1y ago

I think it’s more reflective of the reality of living in the US than of your company’s selection process. I’m curious what you did after realizing this. Did you pivot away, or create a program designed to be useful for the volunteers? Assuming that the volunteer pool accurately represented the larger group.

timetopay1y ago

It just is a thing, tbh. It manifests in the data pretty clearly.

In aggregate, in large data sets, race comes through - especially with a few datapoints. For example, when I worked at a fintech company: with household income and zip code, we could accurately target race with >80% accuracy [0]. Add a few more datapoints, and this would very quickly get closer to 95% accuracy.

That was an _actual_ party-trick[1] demo we did, alongside also de-anonymizing coworkers based on car model, zip code, and bank name.

[0] I worked as a SecEng and were trying to prove that we were(n't) inadvertently targeting race, for compliance reasons. In the end, the business realized the threat and made required changes to prevent this.

[1] We were doing this to make a case for stricter controls and stronger isolation/security measures for storing non-PII data. The business also saw the light on this. Sometimes we'd narrow them down to 30 or 40 people in their zip code, and sometimes (such as a coworker with an old Bentley), it was an instant hit.

godelski1y ago

> Zipcodes in the Appalachias for example.

You're overconstraining what I've said. You're perfectly right that zipcodes in Appalachias account for many poor people that are also white. But actually, you're correctly inferring that you can still infer race out of this, because you're inferring that the majority of these zipcodes are also white. Right? White people are also a race. You're correct that zip code is also able to strongly indicate poor white people. In fact, it is also even able to strongly indicate rich black people. Though you might guess not to the same degree as the overall rate is lower, but people do congregate.

Think about it in a different framing: zipcode strongly correlates with people congregating together who are culturally and economically similar.

I think this version should make sense (especially as the locality affects the culture), and that from here you can extrapolate to recognize that people of varying demographics aren't homogeneously distributed among zipcodes of similar economic bins. I part of this is easily explained by a simple fact: when people move, they like to move to where they have friends, family, or other connections.

ruined1y ago

it's scale-invariant and self-similar. pick a big city or a sundown town, the demographics change but you're measuring a consequence of modern/historic systems larger and longer-lived than either place

j / k navigate · click thread line to collapse

0 comments

8 comments · 4 top-level

sandworm1011y ago· 4 in thread

(The program involved having children who were in regular contact with the criminal justice system.)

_heimdall1y ago

Did it turn out that the selection process actually didn't represent the makeup of the target audience?

pessimizer1y ago

1 more reply

godelski1y ago

And note that at times there where race does play a causal role. (I suspect not likely in the parent's case) For example, different races may be more prone to certain illnesses or genetic disorders.

1 more reply

lapphi1y ago

timetopay1y ago

It just is a thing, tbh. It manifests in the data pretty clearly.

That was an _actual_ party-trick[1] demo we did, alongside also de-anonymizing coworkers based on car model, zip code, and bank name.

godelski1y ago

> Zipcodes in the Appalachias for example.

Think about it in a different framing: zipcode strongly correlates with people congregating together who are culturally and economically similar.

ruined1y ago

j / k navigate · click thread line to collapse