undefined | Better HN

0 pointsisp3y ago0 comments

The "visual inputs" samples are extraordinary, and well worth paying extra attention to.

I wasn't expecting GPT-4 to be able to correctly answer "What is funny about this image?" for an image of a mobile phone charger designed to resemble a VGA cable - but it can.

(Note that they have a disclaimer: "Image inputs are still a research preview and not publicly available.")

0 comments

orangecat3y ago

Wow. I specifically remember "AIs will never be able to explain visual humor" as a confident prediction from the before times of 2020.

_qua3y ago

Yes! I remember the "Obama stepping on the scale" example that was used in that article. Would love to know how GPT-4 performs on that test.

LeanderK3y ago

you mean this http://karpathy.github.io/2012/10/22/state-of-computer-visio...? Very funny to revisit. How primitive our tools were in comparison to now is astounding. It feels like the first flight of the Wright Brothers vs a jetliner. Imagenet was the new frontier. Simpler times...

4 more replies

robocat3y ago

If they are using popular images from the internet, then I strongly suspect the answers come from the text next to the known image. The man ironing on the back of the taxi has the same issue. https://google.com/search?q=mobile+phone+charger+resembling+...

I would bet good money that when we can test prompting with our own unique images, GPT4 will not give similar quality answers.

I do wonder how misleading their paper is.

EMM_3863y ago

Did you watch the livestream?

They literally sent it 1) an a screenshot of the Discord session they were in and 2) an audience submitted image

It described the Discord image in incredible detail, including what was in that, what channels they subscribed to, how many users were there. And for the audience image, it correctly described it as an astronaut on an alien planet, with a spaceship on a distant hill.

And that image looked like it was AI created!

These aren't images it's been "trained on".

kromem3y ago

99% of the comments here have no iota of a clue what they are talking about.

There's easily a 10:1 ratio of "it doesn't understand it's just fancy autocomplete" to the alternative, in spite of published peer reviewed research from Harvard and MIT researchers months ago demonstrating even a simplistic GPT model builds world representations from which it draws its responses and not simply frequency guessing.

Watch the livestream!?! But why would they do that because they already know it's not very impressive and not worth their time outside commenting on it online.

I imagine this is coming from some sort of monkey brain existential threat rationalization ("I'm a smart monkey and no non-monkey can do what I do"). Or possibly just an overreaction to very early claims of "it's alive!!!" in an age when it was still just a glorified Markov chain. But whatever the reason, it's getting old very fast.

1 more reply

OkGoDoIt3y ago

In the livestream demo they did something similar but with a DALLE-generated image of a squirrel holding a camera and it still was able to explain why it was funny. As the image was generated by DALLE, it clearly doesn't appear anywhere on the internet with text explaining why its funny. So I think this is perhaps not the only possible explanation.

yura3y ago

It didn't correctly explain why it was funny though: which is that it's a squirrel "taking a picture of his nuts", nuts here being literal nuts and not the nuts we expect with phrasing like that.

What is funny is neither GPT-4 nor the host noticed that (or maybe the host noticed it but didn't want to bring it up due to it being "inappropriate" humor).

1 more reply

r00fus3y ago

Can it identify porn vs e.g. family pics? Could it pass the "I'll know it when I see it" test?

knicholes3y ago

Some people are sexually aroused by feet. How would YOU define "porn?"

belter3y ago

Does it know what a "man of culture" is?

TremendousJudge3y ago

https://xkcd.com/468/

anything not on your list

callalex3y ago

That’s exactly their point though. It requires intuition to decide if a picture of feet is sexualized or not. Hence the “I know it when I see it” standard they mentioned.

ttul3y ago

I’d bet they pass images through a porn filter prior to even giving GPT-4 a chance to screw that up…

DesiLurker3y ago

I suppose It could do it from porn snapshots, kinda like porn-id thing on reddit. I can see more nefarious uses like identifying car licence plates or faces from public cameras for digital stalking. I know it can be done RN with ALPRs but they have to be manually designed with specialty cameras setups. if this makes it ubiquitous then that would be a privacy/security nightmare.

elicash3y ago

Can it explain this one? https://www.reddit.com/r/seinfeld/comments/e82uuy/new_yorker...

int_is_compress3y ago

Yea it's incredible. Looks like tooling in the LLM space is quickly following suit: https://twitter.com/gpt_index/status/1635668512822956032

davesque3y ago

Am I the only one who thought that GPT-4 got this one wrong? It's not simply that it's ridiculous to plug what appears to be an outdated VGA cable into a phone, it's that the cable connector does nothing at all. I'd argue that's what actually funny. GPT-4 didn't mention that part as far as I could see.

j / k navigate · click thread line to collapse

0 comments

orangecat3y ago

Wow. I specifically remember "AIs will never be able to explain visual humor" as a confident prediction from the before times of 2020.

_qua3y ago

Yes! I remember the "Obama stepping on the scale" example that was used in that article. Would love to know how GPT-4 performs on that test.

LeanderK3y ago

4 more replies

robocat3y ago

I would bet good money that when we can test prompting with our own unique images, GPT4 will not give similar quality answers.

I do wonder how misleading their paper is.

EMM_3863y ago

Did you watch the livestream?

They literally sent it 1) an a screenshot of the Discord session they were in and 2) an audience submitted image

And that image looked like it was AI created!

These aren't images it's been "trained on".

kromem3y ago

99% of the comments here have no iota of a clue what they are talking about.

Watch the livestream!?! But why would they do that because they already know it's not very impressive and not worth their time outside commenting on it online.

1 more reply

OkGoDoIt3y ago

yura3y ago

It didn't correctly explain why it was funny though: which is that it's a squirrel "taking a picture of his nuts", nuts here being literal nuts and not the nuts we expect with phrasing like that.

What is funny is neither GPT-4 nor the host noticed that (or maybe the host noticed it but didn't want to bring it up due to it being "inappropriate" humor).

1 more reply

r00fus3y ago

Can it identify porn vs e.g. family pics? Could it pass the "I'll know it when I see it" test?

knicholes3y ago

Some people are sexually aroused by feet. How would YOU define "porn?"

belter3y ago

Does it know what a "man of culture" is?

TremendousJudge3y ago

https://xkcd.com/468/

anything not on your list

callalex3y ago

That’s exactly their point though. It requires intuition to decide if a picture of feet is sexualized or not. Hence the “I know it when I see it” standard they mentioned.

ttul3y ago

I’d bet they pass images through a porn filter prior to even giving GPT-4 a chance to screw that up…

DesiLurker3y ago

elicash3y ago

Can it explain this one? https://www.reddit.com/r/seinfeld/comments/e82uuy/new_yorker...

int_is_compress3y ago

Yea it's incredible. Looks like tooling in the LLM space is quickly following suit: https://twitter.com/gpt_index/status/1635668512822956032

davesque3y ago

j / k navigate · click thread line to collapse