undefined | Better HN

0 pointstines4y ago0 comments

This raises some really interesting questions.

We certainly don't want to perpetuate harmful stereotypes. But is it a flaw that the model encodes the world as it really is, statistically, rather than as we would like it to be? By this I mean that there are more light-skinned people in the west than dark, and there are more women nurses than men, which is reflected in the model's training data. If the model only generates images of female nurses, is that a problem to fix, or a correct assessment of the data?

If some particular demographic shows up in 51% of the data but 100% of the model's output shows that one demographic, that does seem like a statistics problem that the model could correct by just picking less likely "next token" predictions.

Also, is it wrong to have localized models? For example, should a model for use in Japan conform to the demographics of Japan, or to that of the world?

0 comments

10 comments · 10 top-level

karpierz4y ago

It depends on whether you'd like the model to learn casual or correlative relationships.

If you want the model to understand what a "nurse" actually is, then it shouldn't be associated with female.

If you want the model to understand how the word "nurse" is usually used, without regard for what a "nurse" actually is, then associating it with female is fine.

The issue with a correlative model is that it can easily be self-reinforcing.

5 more replies

SnowHill99024y ago

It’s the same as with an artist: “hey artist, draw me a nurse.” “Hmm okay, do you want it a guy or girl?” “Don’t ask me, just draw what I’m saying.” The artist can then say: “Okay, but accept my biases.” or “I can’t since your input is ambiguous.”

For a one-shot generative algorithm you must accept the artist’s biases.

1 more reply

jonny_eh4y ago

> But is it a flaw that the model encodes the world as it really is

Does a bias towards lighter skin represent reality? I was under the impression that Caucasians are a minority globally.

I read the disclaimer as "the model does NOT represent reality".

4 more replies

skybrian4y ago

Yes, there is a denominator problem. When selecting a sample "at random," what do you want the denominator to be? It could be "people in the US", "people in the West" (whatever countries you mean by that) or "people worldwide."

Also, getting a random sample of any demographic would be really hard, so no machine learning project is going to do that. Instead you've got a random sample of some arbitrary dataset that's not directly relevant to any particular purpose.

This is, in essence, a design or artistic problem: the Google researchers have some idea of what they want the statistical properties of their image generator to look like. What it does isn't it. So, artistically, the result doesn't meet their standards, and they're going to fix it.

There is no objective, universal, scientifically correct answer about which fictional images to generate. That doesn't all art is equally good, or that you should just ship anything without looking at quality along various axes.

godelski4y ago

> But is it a flaw that the model encodes the world as it really is

I want to be clear here, bias can be introduced at many different points. There's dataset bias, model bias, and training bias. Every model is biased. Every dataset is biased.

Yes, the real world is also biased. But I want to make sure that there are ways to resolve this issue. It is terribly difficult, especially in a DL framework (even more so in a generative model), but it is possible to significantly reduce the real world bias.

1 more reply

Imnimo4y ago

>If some particular demographic shows up in 51% of the data but 100% of the model's output shows that one demographic, that does seem like a statistics problem that the model could correct by just picking less likely "next token" predictions.

Yeah, but you get that same effect on every axis, not just the one you're trying to correct. You might get male nurses, but they have green hair and six fingers, because you're sampling from the tail on all axes.

1 more reply

daenz4y ago

I think the statistics/representation problem is a big problem on its own, but IMO the bigger problem here is democratizing access to human-like creativity. Currently, the ability to create compelling art is only held by those with some artistic talent. With a tool like this, that restriction is gone. Everyone, no matter how uncreative, untalented, or uncommitted, can create compelling visuals, provided they can use language to describe what they want to see.

So even if we managed to create a perfect model of representation and inclusion, people could still use it to generate extremely offensive images with little effort. I think people see that as profoundly dangerous. Restricting the ability to be creative seems to be a new frontier of censorship.

2 more replies

webmaven4y ago

> We certainly don't want to perpetuate harmful stereotypes. But is it a flaw that the model encodes the world as it really is, statistically, rather than as we would like it to be? By this I mean that there are more light-skinned people in the west than dark, and there are more women nurses than men, which is reflected in the model's training data. If the model only generates images of female nurses, is that a problem to fix, or a correct assessment of the data?

If the model only generated images of female nurses, then it is not representative of the real world, because male nurses exist and they deserve to not be erased. The training data is the proximate causes here, but one wonders what process ended up distorting "most nurses are female" into "nearly all nurse photos are of female nurses" something amplified a real world imbalance into a dataset that exhibited more bias than the real world, and then training the AI bakes that bias into an algorithm (that may end up further reinforcing the bias in the real world depending on the use-cases).

ben_w4y ago

This sounds like descriptivism vs prescriptivism. In English (native language) I’m a descriptivist, in all other languages I have to tell myself to be a prescriptivist while I’m actively learning and then switch back to descriptivism to notice when the lessons were wrong or misleading.

pshc4y ago

I think it is problematic, yes, to produce a tool trained on data from the past that reinforces old stereotypes. We can’t just handwave it away as being a reflection of its training data. We would like it to do better by humanity. Fortunately the AI people are well aware of the insidious nature of these biases.

j / k navigate · click thread line to collapse

0 comments

10 comments · 10 top-level

karpierz4y ago

It depends on whether you'd like the model to learn casual or correlative relationships.

If you want the model to understand what a "nurse" actually is, then it shouldn't be associated with female.

If you want the model to understand how the word "nurse" is usually used, without regard for what a "nurse" actually is, then associating it with female is fine.

The issue with a correlative model is that it can easily be self-reinforcing.

5 more replies

SnowHill99024y ago

For a one-shot generative algorithm you must accept the artist’s biases.

1 more reply

jonny_eh4y ago

> But is it a flaw that the model encodes the world as it really is

Does a bias towards lighter skin represent reality? I was under the impression that Caucasians are a minority globally.

I read the disclaimer as "the model does NOT represent reality".

4 more replies

skybrian4y ago

godelski4y ago

> But is it a flaw that the model encodes the world as it really is

I want to be clear here, bias can be introduced at many different points. There's dataset bias, model bias, and training bias. Every model is biased. Every dataset is biased.

1 more reply

Imnimo4y ago

1 more reply

daenz4y ago

2 more replies

webmaven4y ago

ben_w4y ago

pshc4y ago

j / k navigate · click thread line to collapse