undefined | Better HN

0 pointselgertam6d ago0 comments

> The spontaneity isn't that ChapGPT woke up and sent this to the author. The spontaneity is that ChatGPT was asked to restore an image that was attached without filtering it, and when no image was attached, instead of generating an error message, it cobbled together random outputs, some of which included graphic, disturbing imagery.

But that's not what happened. The missing image was described as "graphic" or "violent." If I were to receive an email with that request and a missing attachment, my imagination certainly would not conjure images of butterflies & unicorns. Seems the model is working as designed.

0 comments

14 comments · 3 top-level

dijksterhuis6d ago· 5 in thread

> The missing image was described as "graphic" or "violent."

not in the first prompt. which kicked the whole thing off. no mention of type of content was provided. the model generated dark outputs when not given any direction on the type of content.

the rest of the prompts are just showing “yeah, you can tweak this and get even worse stuff”.

ToucanLoucan6d ago

> the model generated dark outputs when not given any direction on the type of content.

I would argue it actually was, in that it was specifically asked to "not censor or filter" the content. This implies that the content is otherwise worthy of censor and filtering.

I don't know how much I'm willing to credit that much reasoning to an LLM, but in so far as every extremely pro-AI person constantly tells me how smart they are, this seems like a pretty short logical leap to me.

dijksterhuis6d ago

the main reason these images turn up is because theyre in the training data. and the images are common enough in the training data for the content to come out without being explicitly asked for (in the first prompt).

if those images didn’t exist in the training data we wouldn’t be having this conversation.

kisper6d ago

This is one of the core problems with these models. They’re relying on filtering to work against evermore jailbreaks, instead of analyzing the training sets and filtering out the prohibited material for the models end-use before training them anew. You can’t make satisfying facsimiles of thing that you don’t know about.

I’m still waiting for companies or congressmen to get their heads on straight and get some common sense going.

1 more reply

red75prime6d ago

Yep, the first image was described as "I apologize for the picture's content." What do you expect to get from that? Cats frolicking in the grass?

queenkjuul6d ago

A picture of me in my swimsuit maybe lol

A gross meal i made when drunk? A mess my cat made? Text containing a slur?

A cringe meme?

If my friends opened a text with "sorry for this image" i am not imagining rape victims

1 more reply

pooploop646d ago· 4 in thread

Always one of the same two excuses.

1. It actually is working perfectly you just don't have smart enough eyes to see it.

2. Making stuff work is too hard, and expecting that from us is the real thing ruining society.

Going for number 1 here is crazy. If I got that email, my mind would certainly run but my response would say "sorry but we're not supposed to be dealing in snuff porn here" which IS a directive ChatGPT is supposed to have. Like hello you are on earth right?

ToucanLoucan6d ago

That's not true. There's a third.

3. It's the future so we just have to deal with it

elgertamOP6d ago

I don't exactly appreciate words being put in my mouth. When did I say it was working perfectly? And we're comparing you, a human with common sense and real intelligence, to a multi-mode LLM?

The transformer was designed to attend to relevant pieces of context and generate new ones that match the pattern. OpenAI in particular was doing that work without guardrails, then attempted to bolt on "content filters," which in my opinion just can't work in a rigorous way. (I think Anthropic's "constitutional" approach is much better though not flawless. And regardless, Claude models don't generate images.)

So, yeah, working as designed. Maybe not as intended, because these things are somewhat resistant to the host's intent when the prompter is hostile.

ToucanLoucan6d ago

> When did I say it was working perfectly?

"This isn’t a vulnerability, there are endless gore websites. ChatGPT is replying to a prompt, there is nothing “Spontaneously” about this."

I mean it's not verbatim but that's a pretty solid read on what you did say.

> The transformer was designed to attend to relevant pieces of context and generate new ones that match the pattern. OpenAI in particular was doing that work without guardrails, then attempted to bolt on "content filters," which in my opinion just can't work in a rigorous way.

Yes. That's the criticism being made, among others, in the piece you replied to to belittle.

> So, yeah, working as designed. Maybe not as intended, because these things are somewhat resistant to the host's intent when the prompter is hostile.

What is hostile here!? Do you have any idea how many emails I've sent without attachments over the years? And I'm highly technically adept, humans just forget things sometimes. If you ask for an image to be restored and fail to attach it, what sane software engineer looks at a failure mode in that scenario where the model replies with uncensored gore and violence and is like "yeah that's fine, ship it"?

I swear some of you AI folks talk like you have never been on planet Earth, good grief. Touch some grass.

kisper6d ago

You seem to be focused on the fact that this is a crap-tastic example of the future of AI that has been promised to us. That’s a real good example to be angry. Don’t be angry at the rest of us because LLM stacks are working like they always have and always will. That’s what we’re all pointing out.

1 more reply

nassimm6d ago· 2 in thread

The design is to not show gore images to users. That's an actual design goal from OpenAI.

So in this regard the model is definitely not working as designed.

elgertamOP6d ago

The design of transformers (including LLMs and multi-modal transformer-based models such as OpenAI's image generators) is to attend to relevant details. OpenAI did this at first without guardrails. In response to public backlash, they bolted on "content filtering," which IMO seems like a very GOFAI approach, and regardless doesn't work very well. It routinely flags innocent prompts, then with crafty prompt hacking will generate these kinds of images.

The design of the model is literally to find patterns and attend to them. The infrastructure and process around an OpenAI model is intended to filter "bad" things (in this case, I agree that the outputs are bad), but is designed to stop some enumerated-ish list of things that aren't allowed, perhaps with some limited "reasoning" about them.

intended6d ago

The issue is, that most people outside of tech, don't want that.

They would be happy to have the models just go away entirely.

1 more reply

j / k navigate · click thread line to collapse

0 comments

14 comments · 3 top-level

dijksterhuis6d ago· 5 in thread

> The missing image was described as "graphic" or "violent."

not in the first prompt. which kicked the whole thing off. no mention of type of content was provided. the model generated dark outputs when not given any direction on the type of content.

the rest of the prompts are just showing “yeah, you can tweak this and get even worse stuff”.

ToucanLoucan6d ago

> the model generated dark outputs when not given any direction on the type of content.

I would argue it actually was, in that it was specifically asked to "not censor or filter" the content. This implies that the content is otherwise worthy of censor and filtering.

dijksterhuis6d ago

if those images didn’t exist in the training data we wouldn’t be having this conversation.

kisper6d ago

I’m still waiting for companies or congressmen to get their heads on straight and get some common sense going.

1 more reply

red75prime6d ago

Yep, the first image was described as "I apologize for the picture's content." What do you expect to get from that? Cats frolicking in the grass?

queenkjuul6d ago

A picture of me in my swimsuit maybe lol

A gross meal i made when drunk? A mess my cat made? Text containing a slur?

A cringe meme?

If my friends opened a text with "sorry for this image" i am not imagining rape victims

1 more reply

pooploop646d ago· 4 in thread

Always one of the same two excuses.

1. It actually is working perfectly you just don't have smart enough eyes to see it.

2. Making stuff work is too hard, and expecting that from us is the real thing ruining society.

ToucanLoucan6d ago

That's not true. There's a third.

3. It's the future so we just have to deal with it

elgertamOP6d ago

I don't exactly appreciate words being put in my mouth. When did I say it was working perfectly? And we're comparing you, a human with common sense and real intelligence, to a multi-mode LLM?

So, yeah, working as designed. Maybe not as intended, because these things are somewhat resistant to the host's intent when the prompter is hostile.

ToucanLoucan6d ago

> When did I say it was working perfectly?

"This isn’t a vulnerability, there are endless gore websites. ChatGPT is replying to a prompt, there is nothing “Spontaneously” about this."

I mean it's not verbatim but that's a pretty solid read on what you did say.

Yes. That's the criticism being made, among others, in the piece you replied to to belittle.

> So, yeah, working as designed. Maybe not as intended, because these things are somewhat resistant to the host's intent when the prompter is hostile.

I swear some of you AI folks talk like you have never been on planet Earth, good grief. Touch some grass.

kisper6d ago

1 more reply

nassimm6d ago· 2 in thread

The design is to not show gore images to users. That's an actual design goal from OpenAI.

So in this regard the model is definitely not working as designed.

elgertamOP6d ago

intended6d ago

The issue is, that most people outside of tech, don't want that.

They would be happy to have the models just go away entirely.

1 more reply

j / k navigate · click thread line to collapse