Infinite Stable Diffusion Videos (opens in new tab)

(orbdog.com)

157 pointslwneal3y ago41 comments

41 comments

32 comments · 15 top-level

lwnealOP3y ago· 5 in thread

Sorry about the bugs, I've just released an update. The site's music should no longer shatter your eardrums until after you touch the unmute button.

The videos are generated from random https://lexica.art prompts, with linear interpolation between two random seeds for each video, held at the same prompt, looped with ffmpeg filter_complex reverse/concat. Music from various creative commons / free sources.

Source code at https://github.com/lwneal/duckrabbit/

Hosted on a single $7 node at https://www.digitalocean.com

swyx3y ago

i find it interesting that you only took 9 commits to do it as well (+244 -62 LOC) https://github.com/CompVis/stable-diffusion/compare/main...l...

breaking it out for others curious:

- modified img2img to accept 2 prompts and 2 seeds (and a slerpradius and output video path)

- uses imutil.Video to create a video

- stochastic encode with noise, then decode, then add a frame to the video, for all n_iter

one thing i dont understand is why OP keeps calling `.half()` on their models. googled it and it seems to be a newish pytorch feature for "half precision" but couldnt get a clear answer on why you would want that. anyone care to share?

sporkl3y ago

I believe using .half() lets you run Stable Diffusion on graphics cards with less vram

CuriouslyC3y ago

Half precision models reduce memory requirements without really impacting the final quality much.

black_puppydog3y ago

Wow this looks nice. Lots of focus on details here in the thread but don't let that distract you.

From the code [1] I take it that you're using polar interpolation on two random points around the original noise right?

This is very cool stuff, thanks for sharing!

[1]: https://github.com/lwneal/duckrabbit/blob/cb375ec7c6067bf805...

permanent3y ago

Great work!

Could you explain more exactly which Droplet/configuration you are using on DigitalOcean?

jonathanstrange3y ago· 5 in thread

I've been following r/StableDiffusion on reddit for a while and was wondering whether this can also be used for anything that doesn't look like a cheap fantasy or science fiction novel cover.

This is an honest question, I haven't seen any example of anything else so I got to wonder whether the models they are using are specialized for sci-fi and fantasy "air brush/digital" style? Why?

can16358p3y ago

Of course it can. It's extremely new, and people are already creating fantastic results since it has been released literally just over two weeks ago.

The models, processes, and collective knowledge will just develop more over time to create MUCH better visuals, videos, and temporally coherent animations.

This is like a new form of art canvas, and we're all getting used to the basics of using the "paints" and "brushes" for it. In a few months/years, some of us will master the skill and produce fantastic artpieces.

isaacfrond3y ago

Sure, it will only get better not worse.

But for know, the stunning stuff tends to be fantasy, sci-fi, impressionistic. Photos of people that are not a portrait are pretty often anatomically impossible. Getting hands, arms and stuff right seems quite difficult for these networks.

Funny enough, I've seen underwater pictures that to me looked quite believable, but to they expert are ridiculous. Lot's of impossible stuff going on. Human brains are ready to fill in a lot of detail.

jonathanstrange3y ago

Interesting. The results so far look pretty good, though only for fantasy and science fiction "fan art" style. That's why I was wondering whether the models are only trained from such inputs. If I understand you correctly, this is not the case and other styles of art can also be produced. Right?

Another question: Do the people who run the software claim copyright on the results even though these are (mostly) produced by the software? It sounds like that when you write "some us will [...] produce fantastic artpieces." I guess it's also legally the case but wonder whether that's also how people experimenting with it understand it.

1 more reply

zhynn3y ago

Fantasy novel covers are exceedingly easy to do because of the mountain of examples in the training data. Basically: any kind of art that we have lots of examples of are very easy to make with these tools.

My fun has been with two games: 1) making unusual art from prose using the art styles of famous painters, 2) playing "AI Pictionary" with friends (can you produce image X; example: a person eating ramen with chopsticks that are light sabers).

moonchrome3y ago

> Fantasy novel covers are exceedingly easy to do because of the mountain of examples in the training data

It's also because they are basically nonsense (fantasy) so the results in the style look more plausible.

I tried using a few AI generators to get some basic placeholder images for products and it's utterly shit, it's clear that networks dont understand what they are generating. Like I tried to generate bycicles and I would get components sticking to ground, floating components, stupid proportions, visual artifacts.

rhacker3y ago· 3 in thread

I know this is silly but I can't wait for games to have automatically generated "levels" that look like this. I guess 3d training and output is probably minimally researched at this point, and there is NERF research... at some point all of this research will truly show off its potential beyond pretty pictures.

AntonioCao3y ago

This post https://twitter.com/madebyollin/status/1566838643771457536 is a step in such direction. I quite like the author's vision about neural nets eating through the graphics pipeline, given there are work like Gancraft (https://nvlabs.github.io/GANcraft/) and GTA photorealistic GAN (https://isl-org.github.io/PhotorealismEnhancement/) have produced stunning results.

samplatt3y ago

My previous boss (in a engineering/construction company) has been working hard to get ahead of the procedural generation game.

If all engineering is, is altering CAD models to conform to local government regulations, then conceptually surely we can procedurally generate a whole fucking city, adding in sewers and power and other infrastructure as it goes.

The reality is much closer to how stable diffusion and other AI-generated art is now - technically and creatively impressive but often nonsensical and broken.

krebby3y ago

Check out wave function collapse (this is a good article: https://www.procjam.com/tutorials/wfc/). It tends to be used for this type of procedural level generation.

And, as another commenter noted, also for cities design. There are actually a few versions of this: I worked on one called Delve https://www.sidewalklabs.com/products/delve

Typically they use constraints optimization or wave function collapse under the hood, on top of some sort of procedural generation pipeline. A few others are Spacemaker, Hypar, and Testfit.

KerrAvon3y ago· 2 in thread

Holy fuck, be sure to turn down your sound before you visit. That should be muted by default.

chii3y ago

i thought the browser blocks sounds from autoplaying without permission - how come it didn't do it this time!?

coolmitch3y ago

almost fell out of my chair

mdale3y ago· 2 in thread

How long before we have interactive full fidelity generated game / films ?

ModernMech3y ago

How do you know you aren't already experiencing one?

maxbond3y ago

I find the term "not even wrong" to be unnecessarily patronizing, and I don't mean to invoke that connotation, but this is the sort of hypothesis it was meant to criticize. There is no amount of evidence which could disprove this hypothesis, but there's no good reason to believe it either (and I do find the statistical argument that's sometimes presented to be unconvincing as it's built upon a tower of arbitrary assumptions which are designed to reach the conclusion that we're living in a simulation). It's a technical veneer over solipsism. I don't find it a helpful thought experiment; I find it to be much more invigorating and useful to assume the world is real and worth engaging with wholeheartedly. And taking a sort of variation of Pascal's wager, I don't see what I lose by living that way if it does turn out that we're brains in jars.

If other people find this to be a helpful I'm interested to hear about it.

2 more replies

epr3y ago

Bug report:

Sound permanently on after hitting next button once on Chromium 104.

Next button does move to next video, but also enables sound. Additionally, this breaks internal state, and the button still shows the muted symbol (despite the sound now being on). Hitting the muted sound button switches the sound button symbol to unmuted, sound still on as before. Hitting it again to mute the sound doesn't work, and doesn't change the symbol back to muted.

eminence323y ago

This is neat. Some text that describes how this was made would be useful. Also the mute button doesn't work.

bl0b3y ago

As an aside, it would be cool if music could also be an input to these kinds of generative models, such that the generated image somehow matches the feeling or mood of the music.

simonebrunozzi3y ago

This looks almost silly now. But I'd bet that in a few years, we will see a full movie, created mostly with the equivalent of Stable Diffusion, win an Oscar.

My bet is that this will happen in 8-9 years from now, but it's just a guess.

I think it's hard to challenge the fact that it WILL happen, at some point in our lifetimes.

yieldcrv3y ago

well at least a side project AI site that didn’t crash immediately

questiondev3y ago

what if there was a way to “increase frame rate” by adding in some type of logic checker between two generated images? kinda like a comparison between two generated frames that would lead to more generated images that mimic movement? so like a filler between frames that would predict how something got to one shape to another using a set of properties that a generated object has, those properties could be weight, speed, gravity etc etc, it just depends on what object it is conceptualizing or constructing

meep0l3y ago

How does this work?

behnamoh3y ago

Not so “stable” then /jk

But please change the music!

Atma-n3y ago

Cool idea! How do you get the prompts from lexica? I can not find that in the repository.

werdnapk3y ago

Calling these "videos" is a bit of a stretch I think.

1 more reply

j / k navigate · click thread line to collapse

41 comments

32 comments · 15 top-level

lwnealOP3y ago· 5 in thread

Sorry about the bugs, I've just released an update. The site's music should no longer shatter your eardrums until after you touch the unmute button.

Source code at https://github.com/lwneal/duckrabbit/

Hosted on a single $7 node at https://www.digitalocean.com

swyx3y ago

i find it interesting that you only took 9 commits to do it as well (+244 -62 LOC) https://github.com/CompVis/stable-diffusion/compare/main...l...

breaking it out for others curious:

- modified img2img to accept 2 prompts and 2 seeds (and a slerpradius and output video path)

- uses imutil.Video to create a video

- stochastic encode with noise, then decode, then add a frame to the video, for all n_iter

sporkl3y ago

I believe using .half() lets you run Stable Diffusion on graphics cards with less vram

CuriouslyC3y ago

Half precision models reduce memory requirements without really impacting the final quality much.

black_puppydog3y ago

Wow this looks nice. Lots of focus on details here in the thread but don't let that distract you.

From the code [1] I take it that you're using polar interpolation on two random points around the original noise right?

This is very cool stuff, thanks for sharing!

[1]: https://github.com/lwneal/duckrabbit/blob/cb375ec7c6067bf805...

permanent3y ago

Great work!

Could you explain more exactly which Droplet/configuration you are using on DigitalOcean?

jonathanstrange3y ago· 5 in thread

I've been following r/StableDiffusion on reddit for a while and was wondering whether this can also be used for anything that doesn't look like a cheap fantasy or science fiction novel cover.

This is an honest question, I haven't seen any example of anything else so I got to wonder whether the models they are using are specialized for sci-fi and fantasy "air brush/digital" style? Why?

can16358p3y ago

Of course it can. It's extremely new, and people are already creating fantastic results since it has been released literally just over two weeks ago.

The models, processes, and collective knowledge will just develop more over time to create MUCH better visuals, videos, and temporally coherent animations.

isaacfrond3y ago

Sure, it will only get better not worse.

jonathanstrange3y ago

1 more reply

zhynn3y ago

moonchrome3y ago

> Fantasy novel covers are exceedingly easy to do because of the mountain of examples in the training data

It's also because they are basically nonsense (fantasy) so the results in the style look more plausible.

rhacker3y ago· 3 in thread

AntonioCao3y ago

samplatt3y ago

My previous boss (in a engineering/construction company) has been working hard to get ahead of the procedural generation game.

The reality is much closer to how stable diffusion and other AI-generated art is now - technically and creatively impressive but often nonsensical and broken.

krebby3y ago

Check out wave function collapse (this is a good article: https://www.procjam.com/tutorials/wfc/). It tends to be used for this type of procedural level generation.

And, as another commenter noted, also for cities design. There are actually a few versions of this: I worked on one called Delve https://www.sidewalklabs.com/products/delve

Typically they use constraints optimization or wave function collapse under the hood, on top of some sort of procedural generation pipeline. A few others are Spacemaker, Hypar, and Testfit.

KerrAvon3y ago· 2 in thread

Holy fuck, be sure to turn down your sound before you visit. That should be muted by default.

chii3y ago

i thought the browser blocks sounds from autoplaying without permission - how come it didn't do it this time!?

coolmitch3y ago

almost fell out of my chair

mdale3y ago· 2 in thread

How long before we have interactive full fidelity generated game / films ?

ModernMech3y ago

How do you know you aren't already experiencing one?

maxbond3y ago

If other people find this to be a helpful I'm interested to hear about it.

2 more replies

epr3y ago

Bug report:

Sound permanently on after hitting next button once on Chromium 104.

eminence323y ago

This is neat. Some text that describes how this was made would be useful. Also the mute button doesn't work.

bl0b3y ago

As an aside, it would be cool if music could also be an input to these kinds of generative models, such that the generated image somehow matches the feeling or mood of the music.

simonebrunozzi3y ago

This looks almost silly now. But I'd bet that in a few years, we will see a full movie, created mostly with the equivalent of Stable Diffusion, win an Oscar.

My bet is that this will happen in 8-9 years from now, but it's just a guess.

I think it's hard to challenge the fact that it WILL happen, at some point in our lifetimes.

yieldcrv3y ago

well at least a side project AI site that didn’t crash immediately

questiondev3y ago

meep0l3y ago

How does this work?

behnamoh3y ago

Not so “stable” then /jk

But please change the music!

Atma-n3y ago

Cool idea! How do you get the prompts from lexica? I can not find that in the repository.

werdnapk3y ago

Calling these "videos" is a bit of a stretch I think.

1 more reply

j / k navigate · click thread line to collapse