undefined | Better HN

0 pointsastrange4y ago0 comments

> I mean, from my perspective, the skill in these (and DALL-E's) image reproductions is truly astonishing.

A basic part of it is that neural networks combine learning and memorizing fluidly inside them, and these networks are really really big, so they can memorize stuff good.

So when you see it reproduce a Shiba Inu well, don’t think of it as “the model understands Shiba Inus”. Think of it as making a collage out of some Shiba Inu clip art it found on the internet. You’d do the same if someone asked you to make this image.

It’s certainly impressive that the lighting and blending are as good as they are though.

0 comments

2 comments · 2 top-level

PheonixPharts4y ago

> these networks are really really big, so they can memorize stuff good.

People tend to really underestimate just how big these models are. Of course these models aren't simply "really really big" MLPs, but the cleverness of the techniques used to build them is only useful at insanely large scale.

I do find these models impressive as examples of "here's what the limit of insane amounts of data, insane amounts of compute can achieve with some matrix multiplication". But at the same time, that's all they are.

What saddens me about the rise of deep neural networks is it is really is the end of the era of true hackers. You can't reproduce this at home. You can't afford to reproduce this one in the cloud with any reasonable amount of funding. If you want to build this stuff your best bet is to go to top tier school, make the right connections and get hired by a mega-corp.

But the real tragedy here is that the output of this is honestly only interesting it if it's the work of some hacker fiddling around in their spare time. A couple of friend hacking in their garage making images of raccoon painting is pretty cool. One of the most powerful, well funded, owners of the likely the most compute resources on the planet doing this as their crowning achievement in AI... is depressing.

2 more replies

hn_throwaway_994y ago

To be clear, I understand the general techniques about (a) how diffusion models can be used to upsample images and generate more photorealistic (or even "cartoon realistic") results and (b) I understand how they can do basic matching of "someone typed in Shiba Inu, look for images of Shiba Inus".

What I don't understand is how they do the composition. E.g. for "A giant cobra snake on a farm. The snake is made out of corn." I think I could understand how it could reproduce the "A giant cobra snake on a farm" part. What I don't understand is how it accurately pictured "The snake is made out of corn." part, when I'm guessing it has never seen images of snakes made out of corn, and the way it combined "snake" with "made out of corn", in a way that is pretty much how I imagined it would look, is the part I'm baffled by.

2 more replies

j / k navigate · click thread line to collapse

0 comments

2 comments · 2 top-level

PheonixPharts4y ago

> these networks are really really big, so they can memorize stuff good.

2 more replies

hn_throwaway_994y ago

2 more replies

j / k navigate · click thread line to collapse