The videos are generated from random https://lexica.art prompts, with linear interpolation between two random seeds for each video, held at the same prompt, looped with ffmpeg filter_complex reverse/concat. Music from various creative commons / free sources.
Source code at https://github.com/lwneal/duckrabbit/
Hosted on a single $7 node at https://www.digitalocean.com
breaking it out for others curious:
- modified img2img to accept 2 prompts and 2 seeds (and a slerpradius and output video path)
- uses imutil.Video to create a video
- stochastic encode with noise, then decode, then add a frame to the video, for all n_iter
one thing i dont understand is why OP keeps calling `.half()` on their models. googled it and it seems to be a newish pytorch feature for "half precision" but couldnt get a clear answer on why you would want that. anyone care to share?
From the code [1] I take it that you're using polar interpolation on two random points around the original noise right?
This is very cool stuff, thanks for sharing!
[1]: https://github.com/lwneal/duckrabbit/blob/cb375ec7c6067bf805...
Could you explain more exactly which Droplet/configuration you are using on DigitalOcean?
This is an honest question, I haven't seen any example of anything else so I got to wonder whether the models they are using are specialized for sci-fi and fantasy "air brush/digital" style? Why?
The models, processes, and collective knowledge will just develop more over time to create MUCH better visuals, videos, and temporally coherent animations.
This is like a new form of art canvas, and we're all getting used to the basics of using the "paints" and "brushes" for it. In a few months/years, some of us will master the skill and produce fantastic artpieces.
But for know, the stunning stuff tends to be fantasy, sci-fi, impressionistic. Photos of people that are not a portrait are pretty often anatomically impossible. Getting hands, arms and stuff right seems quite difficult for these networks.
Funny enough, I've seen underwater pictures that to me looked quite believable, but to they expert are ridiculous. Lot's of impossible stuff going on. Human brains are ready to fill in a lot of detail.
Another question: Do the people who run the software claim copyright on the results even though these are (mostly) produced by the software? It sounds like that when you write "some us will [...] produce fantastic artpieces." I guess it's also legally the case but wonder whether that's also how people experimenting with it understand it.
My fun has been with two games: 1) making unusual art from prose using the art styles of famous painters, 2) playing "AI Pictionary" with friends (can you produce image X; example: a person eating ramen with chopsticks that are light sabers).
It's also because they are basically nonsense (fantasy) so the results in the style look more plausible.
I tried using a few AI generators to get some basic placeholder images for products and it's utterly shit, it's clear that networks dont understand what they are generating. Like I tried to generate bycicles and I would get components sticking to ground, floating components, stupid proportions, visual artifacts.
If all engineering is, is altering CAD models to conform to local government regulations, then conceptually surely we can procedurally generate a whole fucking city, adding in sewers and power and other infrastructure as it goes.
The reality is much closer to how stable diffusion and other AI-generated art is now - technically and creatively impressive but often nonsensical and broken.
And, as another commenter noted, also for cities design. There are actually a few versions of this: I worked on one called Delve https://www.sidewalklabs.com/products/delve
Typically they use constraints optimization or wave function collapse under the hood, on top of some sort of procedural generation pipeline. A few others are Spacemaker, Hypar, and Testfit.
If other people find this to be a helpful I'm interested to hear about it.
Sound permanently on after hitting next button once on Chromium 104.
Next button does move to next video, but also enables sound. Additionally, this breaks internal state, and the button still shows the muted symbol (despite the sound now being on). Hitting the muted sound button switches the sound button symbol to unmuted, sound still on as before. Hitting it again to mute the sound doesn't work, and doesn't change the symbol back to muted.
My bet is that this will happen in 8-9 years from now, but it's just a guess.
I think it's hard to challenge the fact that it WILL happen, at some point in our lifetimes.
But please change the music!