To clarify this test is purely a PASS/FAIL - unsuccessful means that the model NEVER managed to generate an image adhering to the prompt. So as an example, Midjourney 7 did not manage to generate the correct vertical stack of translucent cubes ordered by color in 64 gen attempts.
It's a little beyond the scope of my site but I do like the idea of maintaining a more granular metric for the models that were successful to see how often they were successful.
It's a very interesting resource to map some of the limits of existing models.
I don’t know which is more important, but I would say that people mostly won’t pay for fun but disposable images, and I think people will pay for art but there will be an increased emphasis on the human artist. However users might pay for reliable tools that can generate images for a purpose, things like educational illustrations, and those need to be able to follow the spec very well.
I want to interrupt all of this hype over Imagen 4 to talk about the totally slept on Tencent Hunyuan Image 2.0 that stealthily launched last Friday. It's absolutely remarkable and features:
- millisecond generation times
- real time image-to-image drawing capabilities
- visual instructivity (eg. you can circle regions, draw arrows, and write prompts addressing them.)
- incredible prompt adherence and quality
Nothing else on the market has these properties in quite this combination, so it's rather unique.
Release Tweet: https://x.com/TencentHunyuan/status/1923263203825549457
Tencent Hunyuan had a bunch of model releases all wrapped up in a product that they call "Hunyuan Game", but the Hunyuan Image 2.0 real time drawing canvas is the real star of it all. It's basically a faster, higher quality Krea: https://x.com/TencentHunyuan/status/1924713242150273424
More real time canvas samples: https://youtu.be/tVgT42iI31c?si=WEuvie-fIDaGk2J6&t=141 (I haven't found any other videos on the internet apart from these two.)
You can see how this is an incredible illustration tool. If they were to open source this, this would immediately become the top image generation model over Flux, Imagen 4, etc. At this point, really only gpt-image-1 stands apart as having godlike instructivity, but it's on the other end of the [real time <--> instructive] spectrum.
A total creative image tool kit might just be gpt-image-1 and Hunyuan Image 2.0. The other models are degenerate cases.
More image samples: https://x.com/Gdgtify/status/1923374102653317545
If anyone from Tencent or the Hunyuan team is reading this: PLEASE, PLEASE, PLEASE OPEN SOURCE THIS. (PLEASE!!)
- wine glass that is full to the edge with wine (ie. not half full)
- wrist watch not showing V (hands at 10 and 2 o'clock)
- 9 step IKEA shelf assembly instruction diagram
- any kind of gymnastics / sport acro
I mean, it's a fun edge case, but I'm practice - does it matter?
Not sure if this affects your results or not but I resist chiming in!
I wonder how much the commonality or frequency of names for things affects image generation? My hunch is that it it roughly correlates and you'd get better results for terms with more hits in the training data. I'd probably use Google image search as a rough proxy for this.
Also "create static + video ads that are 0-99% complete" suggests the performance is hit or miss.
Hmm.
This is probably one of the better known benchmarks but when I see Midjourney 7 and Imagen3 within spitting distance of each other it makes me question what kind of metrics they are using.
Created by Ari Kuschnir
I think the change here will be something we've seen with the other modalities. Text was interestingly syntactically correct but nonsense sentences. Then paragraphs but the end of the article would go off the rails. Then the article. Now it's that the creativity of the children's story in question.
Pictures were awful fever dreams filled with eyes but you could kind of see a dog. Then you could see what it was, then decent
Videos were fun that they kind of worked, then surprising it took a few seconds for the panda to turn into spaghetti, then it kept the general style for a decent time.
I see this moving towards the creativity being the major thing, or it having a few general styles (softly lit background for example).
This has mostly all shifted in a very short space of time and as someone who put RBMs on GPUs possibly for the first time (I'm gonna claim it) this is absolutely wild.
Had I seen some of this, say, 6 months ago I'd not have guessed at all bits weren't real.
It wasn't until I was able to get my jaw off the ground that I told her it was AI. No, not AI like special effects, completely AI.
[0] https://www.reddit.com/r/ChatGPT/comments/1kru6jb/this_video...
But they do not allow any people in the image even cartoon depictions of humans. This knee caps a lot of potential usage.
This reminds me of Pixar's video of an animated lamp 40 years ago. I remember that within 5 years Toy Story came out and changed everything on how animated films were made. Looks to me like we are on our way to doing the same thing with realistic movies.
Someone will use AI to make the "AI Killed the Video Star" video. Probably the same guy that made this[1] and other masterpieces.
These larger companies are clearly going after the agency/hollywood use cases. It'll be fascinating to see when they become the default rather than a niche option - that time seems to be drawing closer faster than anticipated. The results here are great, but they're still one or two generations off.
The Tencent Hunyuan team is cooking.
Hunyuan Image 2.0 [1] was announced on Friday and it's pretty amazing. It's extremely high quality text-to-image and image-to-image with millisecond latency [2]. It's so fast that they've built a real time 2D drawing canvas application with it that pretty much duplicates Krea's entire product offering.
Unfortunately it looks like the team is keeping it closed source unlike their previous releases.
Hunyuan 3D 2.0 was good, but they haven't released the stunning and remarkable Hunyuan 3D 2.5 [3].
Hunyuan Video hasn't seen any improvements over Wan, but Wan also recently had VACE [4], which is a multimodal control layer and editing layer. The Comfy folks are having a field day with VACE and Wan.
[1] https://wtai.cc/item/hunyuan-image-2-0
[2] https://www.youtube.com/watch?v=1jIfZKMOKME&t=1351s
[3] https://www.reddit.com/r/StableDiffusion/comments/1k8kj66/hu...
Plus in local generation you're not limited by the platform moderation that can be too strict and arbitrary and fail with the false positives.
Yes comfy UI can be intimidating at first vs an easy to use chatgpt-like ui, but the lack of control make me feel these tools will still not being used in professional productions in the short term, but more in small YouTube channels and smaller productions.
It's for advertising.
But what it means, that with time, Opensource will be as good as what commercial offerings now have. Hardware will get cheaper, research is open or delayed open.
Generating a long video one shot at a time kind of makes sense, as long as there's good consistency between shots
The format of the shows are mostly clip-based - man on the street, news hour, etc - and obviously the jokes are all written by someone with a good sense of humour.
Not to discount that this is, as you say, an example of someone using AI to successfully create characters and stories that resonate with people. it's just still very much because of a creative human's talent and good taste that it's working.
Ok, I went from being pleasantly surprised to breakout laughter at that point.
But I also think this points out a big problem: high-quality stuff is flying under the radar simply because of how much stuff is out there. I've noticed that when faced with a lot of choice, rather than exploring it, people fall back into popular stuff that they're familiar with in a really sad way. Like a lot of door dash orders will be for McDonalds, or people will go back to watching popular series like Friends, or how Disney keeps remaking movies that people still go to see.
Artists aren't going to be replaced by AI tools being used by me on my iPhone, those artists were already replaced by bulk art from IKEA et al. Artists who reject new tools for being new will be replace by artists who don't. Just like many painters were replaced by photographers.
> You're not the monolith of me!
These other universe memes are too good.
You can also use it as a communication tool such as making a "live" storyboard to prep location, blocking, maybe even as notes for actors.
Photography increased the abstract and more creative aspects of painting and created a new style because photography removed much of the need to capture realism. Though, I am still entranced by realist painting style myself, it is serving different purpose than capturing a moment.
AI tools used for any content will / are being used to add to the pile of shit.
Id much rather start seeing individuals creating AI movies where you aren't bogged down by the need to hire actors and what bot
I like how Veo supports camera moves, though I wonder if it clearly recognizes the difference between 'in-camera motion' and 'camera motion' and also things like 'global motion' (e.g. the motion of rain, snow etc).
Obligatory link to Every Frame a Painting, where he talks about motion in Kurosawa: https://www.youtube.com/watch?v=doaQC-S8de8
The abiding issue is that artists (animators, filmmakers etc) have not done an effective job at formalising these attributes or even naming them consistently. Every Frame a Painting does a good job but even he has a tendency to hand wave these attributes.
Its something that is only obvious when it is obvious. And the more obvious examples you see, the more non-obvious examples slip by.
If you look at the shadows in the background, you can see how they appear and disappear, how things float in the air, and have all the AI artifacts. The video is also slowed down (lower FPS) to overcome the length limit of AI video generator.
But the point is not how we can spot these, because it's going to be impossible, but how the future of news consumption is going to look like.
[1] https://www.tiktok.com/@calm.with.word/video/750583708327412...
I don't believe it's entirely fake, just enhanced.
in the owl/badger video, the owl should fly silently.
This is an interesting non-trivial problem of generalization and world-knowledge etc., but also?
There's something somewhat sad about that slipping through; it makes me think, *no one involve in the production of this video, its selection, it passing review... etc., seemed to realize that it is one of the characteristic things about owls that you don't hear their wings.
We have owls on our hill right now and see them almost every day and regularly seem them fly. It's magic, especially in an urban environment.
https://www.youtube.com/watch?v=-WigEGNnuTE
Longer version:
1. People like to be entertained.
2. NeuralViz demonstrates AI videos (with a lot of human massaging) can be entertaining
To me the fundamental question is- "will AI make videos that are entertaining without human massaging?"
This is similar to the idea of "will AI make apps that are useful without human massaging"
Or "will AI create ideas that are influential without human massaging"
By "no human massaging", I mean completely autonomous. The only prompt being "Create".
I am unaware of any idea, app or video to date that has been influential, useful or entertaining without human massaging.
That doesn't mean it can't happen. It's fundamentally a technical question.
Right now AI is trained on human collected data. So, technically, It's hard for me to imagine it can diverge significantly from what's already been done.
I'm willing to be proven wrong.
The Christian in me tells me that Humans are able to diverge significantly from what's already been done because each of us are imbibed with a divine spirit that AI does not have.
But maybe AI could have some other property that allows it to diverge from its training data.
It makes me sad, though. I wish we were pushing AI more to automate non-creative work and not burying the creatives among us in a pile of AI generated content.
Personally I can't wait to see the new creative doors ai will open for us!
I've tried AI image generation myself and was not impressed. It doesn't let me create freely, it limits me and constantly gravitates towards typical patterns seen in training data. As it completely takes over the actual creation process there is no direct control over the small decisions, which wastes time.
Edit: another comment about a different meaning of accessibility: the flood of AI content makes real content less accessible.
The gates are wide open for those that want to put in effort to learn. What AI is doing to creative professionals is putting them out of a job by people who are cheap and lazy.
Art is not inaccessible. It's never been cheaper and easier to make art than today even without AI.
> Personally I can't wait to see the new creative doors ai will open for us!
It's opening zero doors but closing many
---
What really irks me about this is that I have _seen_ AI used to take away work from people. Last weekend I saw a show where the promotional material was AI generated. It's not like tickets were cheaper or the performers were paid more or anything was improved. The producers pocketed a couple hundred bucks by using AI instead of paying a graphic designer. Extrapolate that across the market for arts and wonder what it's going to do to creativity.
It's honestly disgusting to me that engineers who don't understand art are building tools at the whims of the financiers behind art who just want to make a bit more money. This is not a rising tide that lifts all ships.
"Creating" with an AI is like an executive "inventing" the work actually done by their team of researchers. A team owner "winning" a game played by the their team.
That being said, AI output is very useful for brainstorming and exploring a creative space. The problem is when the brainstorming material is used for production.
So the bad news is people are just insecure, jealous, pedantic, easy to offend, highly autistic - and these are the smart ones.
The good news, is with dead internet theory they will all be replaced with bots that will atleast be more compelling make some sort of sense.
What a weird way to spell "give $200 a month to google"
And who owns the AI?
It’s delusional. Stop falling for the mental jiu Jitsu from the large AI labs. You are not becoming an artist by using a machine to make art for you. The machine is the artist. And you don’t own it.
Isn't the creativity in what you put in the prompt? Isn't spending hundreds of hours manually creating and rigging models based on existing sketch the non-creative work that is being automated here?
It's just not what gets the exciting headlines and showcases
Similarly with music, prior to recording tech, live performance was where it was at.
You could look at the digital era as a weird blip in art history.
We _could_ use this to empower humans, but many of us instinctively know that it will instead be used to crush the human spirit. The end result of this isn’t going to be an expansion of creative ability, it’s going to be the destruction of creative jobs and the capture of these creative mediums by a few large companies.
I agree , but that's the negative. The positive will be that almost any service you can imagine (medical diagnosis, tax preparation, higher education) will come down to zero, and with a lag of perhaps a decade or two it will meet us in the physical world with robo-technicians, surgeons and plumbers. The cost of building a new house or railway will plummet to the cost of the material and the land, and will be finished in 1/10 of the time it takes today. The main problem to me is that there's a lag between the negatives and the positives. We're starting out with the negatives and the benefits may take a decade or two to reach us all equally.
The same was said about the camera or photoshop.
Just wanted to add representation to that feeling
Creativity is a conversation with yourself and God. Stripping away the struggle that comes with creativity defeats the entire purpose. Making it easier to make content is good for capital, but no one will ever get fulfillment out of prompting an AI and settling with the result.
Have a look at the workflow and agent design patterns in this video by youtuber Nate Herk when he talks about planning the architecture:
https://m.youtube.com/watch?v=Nj9yzBp14EM
There’s less talk about automating non-creative work because it’s not flashy. But I can promise it’s a ton of fun, and you can co-design these automations with an LLM.
Making a movie is not accessible to most people and it's EVERYONES dream. This is not even there yet, but I have a few movies I need to make and I will never get a cast together and go do it before I die. If some creatives need to take a backseat so a million more creatives can get a chance, then so be it.
There is a more sensical distinction between work that is informational in nature, and work that is physical and requires heavy tools in hard-to-reach places. That's hard to do for big tech, because making tests with heavy machinery is hard and time consuming
good lord. talk about pedantic.
The pace is so crazy that was an over estimation! I'll probably get done in 2. Wild times.
0: https://www.linkedin.com/feed/update/urn:li:activity:7317975...
Feels like there's going to be a dichotomy where the individual visuals look pretty good taken by themselves but the story told by those shots will still be mushy AI slop for a while. I've seen this kind of mushy consistency hold up over the generations so far, it seems very difficult to remove becasue it relies on more context than just previous images and text descriptions to manage.
[0] https://www.reddit.com/r/ChatGPT/comments/1kru6jb/this_video...
The demo videos for Sora look amazing but using it is substantially more frustrating and hit and miss.
My last recollection is recent case said AI generated didn’t have copyright?
Ironically, this would be a good application of AI, where the AI listens in on their calls, and will flag conversation that warrants the keyword being said.
Now it just takes giant compute clusters and inference time.
Origami for me was more audio than video. Felt like it's exactly how it would sound.
My main issue when trying out Veo 2 was that it felt very static. A couple elements or details were animated, but it felt unnatural that most elements remained static. The Veo 3 demos lack any examples where various elements are animated into doing different things in the same shot, which suggests that it's not possible. Some of the example videos that I've seen are neat, but a tech demo isn't a product.
It would be really cool if Google contracted a bunch of artists / directors to spend like a week trying to make a couple videos or short movies to really showcase the product's functionality. I imagine that they don't do that because it would make the seams and limitations of their models a bit too apparent.
Finally, I have to complaint that Flow claims to not be available in Puerto Rico: "Flow is not available in your country yet." Despite being a US territory and being US citizens.
Also Google is going to have to tread carefully, people in the entertainment industry are already AI hostile, and they dictate a surprising amount of public opinion.
I’ve noticed ads with AI voices already, but having it lip synced with someone talking in a video really sells it more
Interesting logic the new era brings: something else creates, and you only "bring your vision to life", but what it means is left for readers questioning, your "vision" here is your text prompt?
Were at a crossroads where the tools are powerful enough to make the process optional.
That raises uncomfortable questions: if you don’t have to create anymore, will people still value the journey? Will vision alone be enough? What's the creative purpose in life? To create, or to to bring creative vision to life? Isn't the act of creation is being subtly redefined?
Right. Imo you have to be imagination handicapped to think that creative vision can be distilled to a prompt, let alone be the medium a creative vision lives in its natural medium. The exact relation between vision, artifact, process and art itself can be philosophically debated endlessly, but, to think artifacts are the only meaningful substrate at which art exists sounds like an dull and hollowed-out existence, like a Plato’s cave level confusion about what is the true meaning vs the representation. Or in a (horrible) analogy for my fellow programmers, confusing pointers to data with the data itself.
Exactly. Probably the most important quote of modern times is, I think it was a CEO of an ISP that said it: "we don't want to be the dumb pipes" (during a comparison with a water utility company).
Everyone wants to seek rents for recurring revenue someone else actually generates.
If you take any high quality AI content and ask their creator what their workflow is, you'll quickly discover that the complexity and nuance required to actually create something high-quality and something that actually "fulfills your vision" is incredibly complex.
Whether you measure quality through social media metrics, reach, or artistic metrics, like novelty or nuance, high quality content and art requires a good amount of skill and effort, regardless of the tool.
Standard reading for context: https://archive.org/details/Bazin_Andre_The_Ontology_of_Phot...
Software Engineers bring their vision to life through the source code they input to produce software, systems, video games, ...
I'm always hesitant with rollouts like this. If I go to one of these, there's no indication which Imagen version I'm getting results from. If I get an output that's underwhelming, how do I know whether it's the new model or if the rollout hasn't reached me yet?
https://aistudio.google.com/generate-image
But this still says it's Imagen 3.0-002, not Imagen 4.
It is so confusing. Ok, I got gemini pro through workspace or something, but not everything is there? Sure, I can try aistudio, flow, veo, gemini etc to figure out what I can do where, but so bad UX. Just tried using gemini to create an image, definitely not the newest imagegen as the text was just marbled up. But I can't see which version I'm on, genious.
Edit: After clicking through lots of google products I'm still not able to find a single place I can actually try the new imagegen, despite the article claiming it's available today in X,Y,Z
However, looking at the UI/UX in Google Docs, it's less transparent.
Im pretty sure kid/child ai porn already exist somewhere. But i'm quite lucky despite knowing rotten.com and plenty of other sides, never having seen real so i doubt i will see fake child porn.
Whats the elephant in the room now? Nothing changed. Whoever consumes real will consume fake too. FBI/CIA will still try to destroy cp rings.
We could even think it might make this situation somehow better because they might consume purely virtual cp?
We should all be hoping AI-generated CSAM floods the CSAM market, instead of trying to restrict AI so that we artificially prop the market up and cause harm to many more humans.
Why is it that all these AI concept videos are completely crazy?
However, I also think this is to show that it can create anything, not just copies of stuff it has seen. If you ask for a painting of a woman and it shows you mona lisa, that's not very impressive.
Like if you asked a model to help you create a coffeeshop website for a demo, it started looking more like sex shop, you just vibe with it and say that's what you wanted in the first place. I've noticed that the success rate of using AI is proportional to much you can gaslight yourself.
This naming seems very confusing, as I originally thought there must be some connection. But I don't think there is.
But then again, the do no evil motto is long gone, so I guess anything goes now?
The obvious aim of these foundational image/movie generation AI developments is for these to become the primary source of values at cost and quality unparalleled by preexisting human experts, while allowing but not necessitating further modifications by now heavily commoditized and devalued ex-professional editors at downstream to allow for their slow deprecation.
But the opposite seem to be happening: better data are still human generated, generators are increasingly human curated, and are used increasingly closer to the tail end of the pipeline instead of head. Which isn't so threatening nor interesting to me, but I do wonder if that's a safe, let alone expected, outcome for those pushing these developments.
Aren't you welding a nozzle onto open can of worms?
https://www.youtube.com/watch?v=SPF4MGL7K5I
Obviously we don't know how hand picked that is so it would be interesting to see a comparison from someone with access.
Since Google seems super cagey about what their exact limits actually are, even for paying customers, it's hard to know if that's an error or not. If it's not an error, if it's intentional, I don't understand how that's at all worth $20 a month. I'm literally trying to use your product Google, why won't you let me?
https://www.figure.ai/ does not exist yet, at least not for the masses. Why are Meta and Google just building the next coder and not the next robot?
Its because those problem are at the bottom of the economic ladder. But they have the money for it and it would create so much abundance, it would crash the cost of living and free up human labor to imagine and do things more creatively than whatever Veo 4 can ever do.
In the forecast of the AI-2027 guys, robotics come after they've already created superintelligent AI, largely just because it's easier to create the relevant data for thinking than for moving in physical space.
Ideogram and gpt4o passes only a few, but not all of them.
Soon, you should be able to put in a screenplay and a cast, and get a movie out. Then, "Google Sequels" - generates a sequel for any movie.
This "fixes" Hollywood's biggest "issues". No more highly paid actors demanding 50 million to appear in your movie, no more pretentious movie stars causing dramas and controversies, no more workers' unions or strikes, but all gains being funneled directly to shareholders. The VFX industry being turned into a gig meatgrinder was already the canary in the coal mine for this shift.
Most of the major Hollywood productions from the last 10 years have been nothing but creatively bankrupt sequels, prequels, spinoffs and remakes, all rehashed from previous IP anyway, so how much worse than this can AI do, since it's clear they're not interested in creativity anyway? Hell, it might even be an improvement than what they're making today, and at much lower cost to boot. So why wouldn't they adopt it? From the bean counter MBA perspective it makes perfect sense.
All this is in line with my prediction for the first entirely AI generated film (with Sora or other AI video tools) to win an Oscar being less than 5 years away.
And we're only 5 months in.
The guy in the third video looks like a dressed up Ewan McGregor, anyone else see that?
I guess we can welcome even more quality 5 second clips for Shorts and Instagram
Think of all of your favorite novels that are deemed "impossible" to adapt to the screen.
Or think of all the brilliant ideas for films that are destined to die in the minds of people who will never, ever have the luck or connections required to make it to Hollywood.
When this stuff truly matures and gets commoditized I think we are going to see an explosion of some of the most mind blowing art.
On a more societal level, I'm not sure continuously diminishing costs for producing AI slop is a net benefit to humanity.
I think this whole thing parallels some of the social media pros and cons. We gained the chance to reconnect with long lost friends—from whom we probably drifted apart for real reasons, consciously or not—at the cost of letting the general level of discourse to tank to its current state thanks to engagement-maximizing algorithms.
Not in 10 years but now.
People who just see this as terrible are wrong. AI improving curves is exponential.
People adaptability is at best linear.
This makes me really sad. For creativity. For people.
Of course this is not because of AI. It's because of the ridiculous system of social organization where increased automation and efficiency makes people worse off.
Can’t wait to see what people start making with these
Sora, the image model (gpt-image-1), is phenomenal and is the best-in-class.
I can't wait to see where the new Imagen and Veo stack up.
Technology is inevitable and it's a tool, advancing technology will always leave people who specialize and are unable to adapt in a bad position, but this won't stop technology from advancing.
I think one could argue this is one of the reasons many people would like their community/government to provide social safety nets for them. It would make specializing less risky in a time when technology advances at a fast pace.
Thank you, researchers, for making our world worse. Thank you for helping to kill democracy.
They all got smoked by Google with what they just announced.
Google what is this?
How would anyone use this for a commercial application.
There is an ever growing percentage of new AI-generated videos among every set of daily uploads.
How long until more than half of uploads in a day are AI-generated?
The remaining 10% is the solution to generating good hands, of course. And do you think YouTube has been helping anyone achieve that?
If we look at the Veo 3 examples, this is not the typical youtube video, but instead they seem to recreate cgi movies, or actual movies.
With the media & entertainment hungry world which is about to get worse with the unempoyed/underemployed tiktok generation needing "content", something like this has to have a play.
Drive the storytelling, consult with AI on improving things and exploring variations.
Generate visuals, then adjust / edit / postprocess them to your liking. Feed the machine your drawings and specific graphic ideas, not just vague words.
Use generated voices where they work well, record real humans where you need specific performance. Blend these approaches by altering the voice in a recording.
All these tools just allow you to produce things faster, or produce things at all such that would be too costly to shoot in real life.
Now it's "good enough" for a lot of cases (and the pace of improvement is astounding).
AI is still not great at image gen and video gen, but the pace of improvement is impressive.
I'm skeptical image, video, and sound gen are "too difficult" for AI to get "good enough" at for many use cases within the next 5 years.
In 2 years we have moved from AI video being mostly a pipe dream to some incredible clips! It’s not what this is like now, but what will it be like in 10 years!
A bit depressing.
I mean obviously the answer is "no" and this is going to get a bunch of replies saying that inventors are not to blame but the negative results of a technology like this are fairly obvious.
We had a movie two years ago about a blubbering scientist who blatantly ignored that to the detriment of his own mental health.
I cant be the only one wondering where the swedish beach volleyball channel is though.
I imagine video is a far tougher thing to model, but it's kind of weird how all these models are incapable of not looking like AI generated content. They all are smooth and shiny and robotic, year after year its the same. If anything, the earlier generators like that horrifying "Will Smith eating spaghetti" generation from back like three years ago looks LESS robotic than any of the recent floaty clips that are generated now.
I'm sure it will get better, whatever, but unlike the goal of LLMs for code/writing where the primary concern is how correct the output is, video won't be accepted as easily without it NOT looking like AI.
I am starting to wonder if thats even possible since these are effectively making composite guesses based on training data and the outputs do ultimately look similar to those "Here is what the average American's face looks like, based on 1000 people's faces super-imposed onto each other" that used to show up on Reddit all the time. Uncanny, soft, and not particularly interesting.