Don't get me wrong, they are still impressive in the quality of the visual they produce, but just like Markov Chain demos of old, they're neat but way miss the mark.
None of these capture the "feel" of Kermit the Frog. Most of them look like weird designs for the Ninja Turtles movie in the 90s.
There are several distinctive features of Kermit that a missing from nearly all of these.
- For any of the "live action" ones, Kermit should still always be a puppet. - Kermit notoriously has lanky arms, - Kermit never has eye lids - His eyes sit way on top of his head. - He often has his weird neck decoration. - His eyes have a very distinctive pupil shape.
None of these get Kermit correct, they all just look like frogs (maybe Dalle2 isn't trained on copyrighted/trademarked material?)
There are fan made versions of some of these which show just how different Dalle2 is from human imagination:
Kermit actually has been on family guy: https://static.wikia.nocookie.net/muppet/images/7/71/Famguys...
There are several "Kermit in Star Wars Examples" here are two: https://i.kym-cdn.com/entries/icons/original/000/021/668/ker..., https://i.ytimg.com/vi/6MebZx-4950/maxresdefault.jpg
Again if this was done on someone's laptop it would be really impressive. However the fact that so much talent and resources were poured into pushing AI to it's limits and this is what we get tells me we've hit another brick wall as far as research goes.
For example, your Star Wars example...
https://i.ytimg.com/vi/6MebZx-4950/maxresdefault.jpg
It's clearly just an existing photo of Kermit pasted over an image from the film. There are even two sets of arms. I could Photoshop that in a few minutes.
Then, the Dalle2 image...
https://pbs.twimg.com/media/FUEDDm2UEAAO8yb?format=jpg&name=...
I think it's impressive. It looks like Kermit is a character in the Star Wars universe. There are a few issues with the eyes and feet, and it's also hard to tell if it's a creature or a person in a frog suit. However, it gets 90% of the way there, and the pose is great for a frog/human hybrid.
The most exciting thing is how this could be used as a starting point for design. I could take the Dalle2 Kermit image above, fix the eyes/feet, add a few distinctive Kermit features, and have a great piece of concept art in an hour, rather than taking a day or two to create something from scratch. Obviously it can't be applied to all workflows, but for those it's suited for, it'll save vast amounts of time and costs. For that reason, it's already something of real value in its current state. The same can't be said about the Star Wars examples you provided.
There are many ways to define what "Kermit the Frog in $MOVIE" means, and the choice the AI made is absolutely valid. There are of course various other valid choices, but this doesn't invalidate the ones presented.
Furthermore, judging by some other examples in this HN thread, it seems that the fact most of the pictures are not puppets is more of a choice of the human choosing the photos, as in other cases DALL-E was indeed adding puppet-like characters in movie-like decors.
"Sure, this AI can produce high-resolution realistic images leaps and bounds above anything that's been shown before... but there's an aspect which could use improvement. Obviously, this proves that the current AI technology will never amount to anything and we should just give up on it now."
You might be missing the point of what OpenAI is doing. The point is to show off the capability of their models in a way that's likely to go viral and lead to more business for OpenAI. Some people laughed at GPT-3's silly demos, but when they launched GitHub Copilot...
If people say Dalle can improve the workflow of digital artists, sure, but Copilot hasn't revolutionized programming either, you still have to be a good programmer to finish whatever you are doing:
> A paper accepted for publication in the IEEE Symposium on Security and Privacy in 2022 assessed the security of code generated by Copilot [...] The study found that across these axes in multiple languages, 39.33% of top suggestions and 40.73% of total suggestions lead to code vulnerabilities. Additionally, they found that small, non-semantic (i.e., comments) changes made to code could impact code safety.[14]
What happened next? Is anyone using copilot for serious work? Has it changed programming in a fundamental way?
I personally have zero use for copilot since the for type of code I write the actual code writing is not a bottle neck, so automating that process is of no value to me. On top of that getting the details exactly right is essential so the ratio of boiler plate to real code is very, very low for me.
However, I don't think you're correct in your assessment of the import of this sort of thing: it's an imagination machine. This isn't a brick wall, it's a foundation on which to build.
It gets a lot of things wrong, like I'm not sure why kermit has a plastic texture in many of the pictures. If you showed me ten pictures of Kermit and ten frames of total recall, and for some reason 8 of your pictures had a plastic Kermit, and asked me to combine them in my head, I'd probably imagine something on-par or worse than what Dalle has managed to do. But I wouldn't be able to show anyone what I'd made!
Contrast with real creativity (what people can do but machines currently cannot) where you conjecture something completely new.
For example, Copernicus conjecturing the idea that the Earth revolves around the Sun. No machine learning model would have gotten there because it would have been trained on a bunch of data that said the Earth was the center of the universe.
The image is new, it did not previously exist. It is a creation, a very vague idea of a few words that was created in full realization.
So it sees like the only difference between the "Not creativity" that Dall-E is doing and "Real Creativity" that humans do is tht humans are the ones doing it?
I agree there's this concept of expanding the frontiers of human aesthetic capability that has slow-marched from cave paintings till post-modernism. That there are a very few artists that invent completely new styles that the rest of us copy and remix. It's questionable that Dall-E can do that, but I'm also not sure that it can't do that.
Copernicus got his idea after gathering a lot of data, explicitly and implicitly, training his internal model of the world.
I fail to see what the difference is between 99% of existing "creativity," which is essentially arranging existing ideas into novel combinations, and what DALLE2 does.
Creativity is a very vague word, I'm sure we can come up with definitions of it that let humans keep sole domain over it. But breakthroughs often come from combining domains and concepts, very very rarely do we ever jump out of one local maxima into another, and I'm not even convinced that Copernicus counts as that. There's a reason why there are so many examples of the same breakthrough happening in multiple places in the world independently - innovation is a slow gradual collaborative process and not plateaus waiting for men of genius to have a spark of inspiration.
Also I'm not convinced that a computer couldn't have discovered the earth revolves around the sun - it's hard to make machine learning jump out of local maxima, but it does happen, and I can see some hidden layers becoming far more efficient at predicting outcomes by stumbling across a model that centered the sun. That being said - there likely are examples of things that computers couldn't have theoretically figured out the model for, but I'm hard pressed to think of one.
Call it moving goalposts, no true Scottsman, the AI effect, whatever. The behavior is as follows: an argument over whether an ill-defined attribute is possessed by a computer is defended or attacked with useless semantics since nobody can agree on what any of the words mean anyway.
Creativity, intelligence, consciousness.
It doesn't matter what you say, you cannot define these concepts with the same clarity you use to defend that the concept is missing.
Saying there is no creativity because its just a neural net extrapolating from data is like saying there's no god because its all just atoms: what is god and why would the existence of atoms have anything to do with it.
Learn from Wittgenstein: Worüber man nicht sprechen kann, darüber muss man schweigen.
I am 100% sure Copernicus was not the first to suggest a heliocentric system, but he was the one who put enough energy into proving it and defend that theory.
Such a cute point of view, completely wrong but cute. Please go find the original images of Kermit in Blade Runner and WallE that were just copied here.
>For example, Copernicus conjecturing the idea that the Earth revolves around the Sun. No machine learning model would have gotten there because it would have been trained on a bunch of data that said the Earth was the center of the universe.
If the model were trained on actual observations of planetary trajectories it would trivially recreate keplers laws, newtons laws etc.
That might be the known bug with low-resolution textures: the DALL-E 2 paper notes that the details in very complex scenes can be bad, and thinks it's because you start with a 64px image which is necessarily bad for details (64px is really small!) and upscale with dumber models from there https://cdn.openai.com/papers/dall-e-2.pdf#page=17 I think this explains the issues with images where the 'skin' or 'fur' looks really creepy (eg. all the semi-nude bears).
E.g. The Shining picture of Jack Nicholson with the door isn't representative of the "look" of the film, but very much an iconic still frame and basically what you see in a Google image search for "The Shining".
Especially if the movie(s) that are eventually generated this way are ripping whole scenes or sequences out of other films, a la copilot.
It’s like extremely expensive piracy that is bad for artists and bad for the environment.
I wonder if the reason OpenAI, Google, etc don’t release these things isn’t so much that they’re worried about racist/offensive output, but instead they’re worried about people using it to create images of, say, Mickey Mouse and drawing the attention of his lawyers? It’d be better for AI companies to keep all of this stuff in a legal gray area for as long as possible.
Typically, creators are very protective of this sort of thing, unless it stays in the area of fan art. If anyone tries to seriously monetize this kind of output, I'm sure we'll see a lot of cases.
Imagine what Disney would do if you used DALL-E to create an animated feature film in the style of Mickey Mouse, but with cats instead of mice, and they found out you used actual footage from, say, Fantasia to train an AI model. No idea if they would win, but I'm certain they'd sue.
We are at the precipice of someone releasing a $100M blockbuster movie just based on the language in the script with zero cost beyond compute.
What will this mean for the future of entertainment…
Just imagine how much lonelier the world is going to feel when people don't even have entertainment in common anymore.
Kermit in Debby Does Dallas. Kermit in the Graduate. Kermit with 2 broken flippers. Oh the depravity. I'm not sure getting high quality visualizations of any random passing thought is a good idea ;-)
On the other hand, I feel like this will ultimately be kinda like traditional procgen algorithms: once you've seen enough of what it produces it all starts feeling very bland and same-y. Sure, the AI may be able to produce a feature-length movie based on the input "What if Nicolas cage had played The Terminator and Aaron Sorkin wrote the script?", but somehow none of it would be surprising or interesting to you, it would lack the novelty and playfulness of a good human creative work, and it likely would be very shallow in its themes.
On the gripping hand though, perhaps in achieving that level of sophistication we inadvertently create something more alive and aware than we intended and instead of merely trying to produce satisfactory results it actually attempts to express itself in ways that resonate with us.
Simply tune the parameters associated with novelty and playfulness and you’ll get the desired result.
There’s nothing inherent in human creativity that can’t be replicated by an AI. Most creative work is derivative and remix’s prior art.
This is a good short video on the phenomenon of remixing https://m.youtube.com/watch?v=MZ2GuvUWaP8
You can't handle the future! We live in a world that has time machines, and those time machines have to be manned by robots! Who's gonna do it? You?
"Like Facebook but like make it not suck"
AI: "Here you go!"
For this one in particular, here are a few more results for Battlestar and The Office:
https://twitter.com/Miles_Brundage/status/153247388947686195...
Do you feel that the human mind is more than an "appropriately" trained "biological" neural network?
What do you consider the limits of a DALL-E like system compared to a "true" mind?
My personal opinion is that the Chinese Room argument is fancy handwaving that crucially relies on never being explicit about what it means by "understanding", combined with an appeal to intuition.
I strongly believe that there is nothing "magical" about the human mind or brain (that could not be replicated artificially), and thus that a comparably trained, appropriately designed system ("DALL-E successor") OR a copy OR a simulation of a human brain would be all just as capable and "understanding"/"conscious" as another human...
I don't have access to DALL-E 2, but I wonder if a prompt like "A cameo from Kermit the Frog in ..." would give more literal Kermits.
"To me those are clearly Kermit the Frogs."
"To me those are clearly not Kermit the Frogs."
Then there's nothing really to argue about. Instead we can discuss what we see and how that affects our subjective perceptions.
For example, Kermit the Frog doesn't have eyelids, but most of these images show a frog with eyelids.
This shows the limits of the Turing test. To pass it a program must not only be smart enough, it must be dumb enough too.
Pulling what DALL-E does is a tell-tale sign it’s most likely not human, and would make it fail the test.
For example, looking at the WALL-E one [0], you can clearly see that the hands and feet aren't actually separated properly. There is also plenty of missing "logic" around the armpits. These are the kinds of mistakes a human can't make - especially one that is so adept at drawing the other parts so perfectly.
[0] https://twitter.com/HvnsLstAngel/status/1531512163738669057/...
2022: This machine fails the Turing test, it's way too smart! No human could be this good at creating art.
If we accept that a model trained on copyright material does not infringe on the materials rights, then circumventing all copyright can be as simple as creating a sufficiently close derivative and giving it away.
Not to say that copyright is good to begin with.
Stylistic inspiration is not an infringement of copyright, in either that case or the "do it on a computer" case here.
The Kermit the Frog aspect though is interesting - it applies equally for both the human and machine made works - if an argument could be made that the subject of the work sufficiently resembles the character, maybe there's a trademark issue at hand?
But in any scenario, nothing legally novel about the work being created by machine.
…except for the fact that it was created by a machine.
Just like copyright law had to be revised to deal with software and the internet, it will need to be revised to deal with AI.
That's correct. People do this all the time, sans the giving it away part.
Also, there is no way that you can argue these images are not transformative.
Exactly. Kermit has been very much transformed, and "in the style of" is not copyright infringement AFAIK.
Here's what I got for "A still of Kermit The Frog in Blade Runner 2049 (2017)":
- Were the prompts shown the ones fed to DALL-E 2 or were there more complex details described in the prompt?
- Were these the first images generated for the prompt, or did the author generate many images and cherry-pick the best example, and if so from how many?
Although if an individual created all of these then that's about the same amount of impressive
[1]: https://twitter.com/HvnsLstAngel/status/1531774195234791424?...
[2]: https://duckduckgo.com/?t=ffab&q=eraserhead+baby&iax=images&...
DALL-E would just shoot back a still with Kyle McLachlan in it. He's already so Kermit like!
There are open-source efforts to implement it and make trained models available, but I don't imagine they are yet at the same scale of ingested data / model size as OpenAI's system: https://github.com/lucidrains/DALLE2-pytorch
https://user-images.githubusercontent.com/1332366/171921054-...
I understand computers and I understand back-propagation but this... it feels like magic to me.
Can someone indulge me in a short explanation of how this works and how is it this good?
- https://www.assemblyai.com/blog/how-dall-e-2-actually-works/
But people who do viral news and posts don't...read. So, their impact will continue to go unnoticed in comparison to DeEpFaKeS and Dall-E.
Very good one: https://np.reddit.com/r/dalle2/comments/u5kkty/a_fluffy_baby...
“a masterful impressionist portrait painting of a little doggey who is worried he may not be a good boy”: https://nitter.net/MarkRich388/status/1532482006809866240
1. People thinking it's amazing (me) 2. People thinking it's not creative enough e.g. "It’s using what already exists, not conjecturing something new." 3. People thinking it's too creative e.g. "This looks nothing like Kermit"
Kermit the Frog in Salò, or the 120 Days of Sodom
Kermit the Frog in Pink Flamingos
----
I actually might have Dalle2 access soonish. Honestly this is the best demonstration I've seen that demonstrates to me very well that we are about 2 years away from maybe not "general ai" but some pretty wild shit that is going to make most of what we do and value as humans very different.