undefined | Better HN

0 pointsjstanley5mo ago0 comments

I was confused too at first. This is an SVG generated by an LLM - it's not from an image model.

How well do you reckon you could draw a pelican on a bicycle by typing out an SVG file blind?

0 comments

I mean how well do you reckon you can denoise a jpg by hand until its a piece of art? That way of thinking isn’t helpful to understanding AI IMO

jstanleyOP5mo ago

I didn't intend it as a general-purpose tool for understanding AI, but as an intuition pump for why this problem is hard for LLMs specifically.

int_19h5mo ago

In this case it is actually relevant. The ability to draw a pelican on a bicycle correctly depends a great deal on understanding not only what both look like in general, but on the spatial relationships between the various objects and their parts. Models that can draw this kind of thing better also tend to be better at tasks that require understanding of how things go together and interact in 3D space.

bgwalter5mo ago

How do we know it's not just a mashup of existing pictures? All generated pelicans on bikes look somewhat cartoonish and use historical or artsy bikes. This is training material from 2015:

https://www.behance.net/gallery/29122113/Pelican-on-bikes-wi...

There are other such images. Not an image model? How do we know that they don't convert all images to svg and train an LLM on it? How do we know that they do not cheat on this benchmark and route the query to an image model first?

1 more reply

j / k navigate · click thread line to collapse