Veo 3 and Imagen 4, and a new tool for filmmaking called Flow (opens in new tab)

(blog.google)

832 pointsyoussefarizk10mo ago532 comments

532 comments

After doing some testing, Imagen 4 doesn't score any higher than Imagen 3 on my comparison chart, approximately ~60% prompt adherence accuracy.

https://genai-showdown.specr.net

bigmadshoe10mo ago

I'm curious why you decide to declare victory after one successful attempt, but try many times for unsuccessful models. Are you trying to measure whether a model _can_ get it right, or whether it frequently _does_ get it right? I feel like success rate is a better metric here, or at least a fixed number of trials with some success rate threshold to determine model success.

vunderba10mo ago

It's hard to nail down a good objective metric on something that is always going to be marginally qualitative in nature but it's a good call out - I should probably add a FAQ to the site.

To clarify this test is purely a PASS/FAIL - unsuccessful means that the model NEVER managed to generate an image adhering to the prompt. So as an example, Midjourney 7 did not manage to generate the correct vertical stack of translucent cubes ordered by color in 64 gen attempts.

It's a little beyond the scope of my site but I do like the idea of maintaining a more granular metric for the models that were successful to see how often they were successful.

1 more reply

ipnon10mo ago

It is indicative of marginal improvements instead of new breakthroughs. iPhone 1 was a paradigm shift. iPhone 10 was essentially iPhone 9 with tweaks. As an AI optimist I would be disappointed to find we are already seeing diminishing returns on R&D.

3 more replies

woolion10mo ago

The winning image entry for "The Yarrctic Circle" by OpenAI 4o doesn't actually wields a cutlass. It's very aesthetically pleasing, even though it's so wrong in all fundamental aspects (perspective is nonsensical and anatomy is messed up, with one leg 150% longer than the other, ...).

It's a very interesting resource to map some of the limits of existing models.

danpalmer10mo ago

In my own testing between the two this is what I’ve noticed. Imagen will follow the instructions, and 4o will often not, but produces aesthetically more pleasing images.

I don’t know which is more important, but I would say that people mostly won’t pay for fun but disposable images, and I think people will pay for art but there will be an increased emphasis on the human artist. However users might pay for reliable tools that can generate images for a purpose, things like educational illustrations, and those need to be able to follow the spec very well.

2 more replies

echelon10mo ago

Google Flow is remarkable as video editing UX, but Imagen 4 doesn't really stand out amongst its image gen peers.

I want to interrupt all of this hype over Imagen 4 to talk about the totally slept on Tencent Hunyuan Image 2.0 that stealthily launched last Friday. It's absolutely remarkable and features:

- millisecond generation times

- real time image-to-image drawing capabilities

- visual instructivity (eg. you can circle regions, draw arrows, and write prompts addressing them.)

- incredible prompt adherence and quality

Nothing else on the market has these properties in quite this combination, so it's rather unique.

Release Tweet: https://x.com/TencentHunyuan/status/1923263203825549457

Tencent Hunyuan had a bunch of model releases all wrapped up in a product that they call "Hunyuan Game", but the Hunyuan Image 2.0 real time drawing canvas is the real star of it all. It's basically a faster, higher quality Krea: https://x.com/TencentHunyuan/status/1924713242150273424

More real time canvas samples: https://youtu.be/tVgT42iI31c?si=WEuvie-fIDaGk2J6&t=141 (I haven't found any other videos on the internet apart from these two.)

You can see how this is an incredible illustration tool. If they were to open source this, this would immediately become the top image generation model over Flux, Imagen 4, etc. At this point, really only gpt-image-1 stands apart as having godlike instructivity, but it's on the other end of the [real time <--> instructive] spectrum.

A total creative image tool kit might just be gpt-image-1 and Hunyuan Image 2.0. The other models are degenerate cases.

More image samples: https://x.com/Gdgtify/status/1923374102653317545

If anyone from Tencent or the Hunyuan team is reading this: PLEASE, PLEASE, PLEASE OPEN SOURCE THIS. (PLEASE!!)

2 more replies

vunderba10mo ago

Good catch - that's on me I accidentally uploaded the wrong image for gpt-image-1. Fixed!

NoahZuniga10mo ago

I can't find the image you're talking about. Link pls?

tintor10mo ago

Hands in Winning entry in "Not the Bees" are very unlike any driver. I wouldn't count it as a pass.

vunderba10mo ago

I hate to say it but I feel like as a result of staring at so many equivalents of Tyrone Rugen since the dark ages of Stable Diffusion 1.5 - I literally DID NOT EVEN notice that until you called it out. The training data in my wetware has been corrupted.

tintor10mo ago

More difficult examples:

- wine glass that is full to the edge with wine (ie. not half full)

- wrist watch not showing V (hands at 10 and 2 o'clock)

- 9 step IKEA shelf assembly instruction diagram

- any kind of gymnastics / sport acro

viraptor10mo ago

What's the reason to test the "not showing ..."? I've never seen anyone make that kind of request in real life. They ask for what they actually want instead. You'd ask for a clock showing 3:25 rather than "not 10:10".

I mean, it's a fun edge case, but I'm practice - does it matter?

2 more replies

strongpigeon10mo ago

How can you tell you're using Imagen 4 and not Imagen 3? Gemini seems unable to tell me which model it's using. Are you using Vertex AI?

vunderba10mo ago

I used Whisk. The model listing shows 3/4 because testing against Imagen 4 did not result in a measurable increase in accuracy from Imagen 3.

https://labs.google/fx/tools/whisk

sidibe10mo ago

Well they've labelled it 3/4 so I'm guessing they can't but you can use 4 it in whisk

EGreg10mo ago

Tell me you’re using Imagen 3 without telling me you’re using Imagen 4… or something

andybak10mo ago

Side note. It's my understanding that being a pith helmet is pretty orthogonal to having a spike. Plenty of helmets with spikes aren't pith helmets and plenty of pith helmets don't have spikes.

Not sure if this affects your results or not but I resist chiming in!

andybak10mo ago

Also "Hippity Hop" is a Space Hopper! Wikipedia agrees with me: https://en.wikipedia.org/wiki/Space_hopper :)

I wonder how much the commonality or frequency of names for things affects image generation? My hunch is that it it roughly correlates and you'd get better results for terms with more hits in the training data. I'd probably use Google image search as a rough proxy for this.

1 more reply

Onavo10mo ago

How do companies like https://icon.com do their image Gen if the existing SOTA for prompt adherence is so poor?

yorwba10mo ago

People who generate images for ads probably don't often need strict prompt adherence, just a random backdrop to slap a picture of their product on top of. The kind of thing they'd have used a stock image library for before.

Also "create static + video ads that are 0-99% complete" suggests the performance is hit or miss.

1 more reply

peab10mo ago

fine tuning and prompt techniques can go a long way. That + cherrypicking results

htrp10mo ago

multishot generation with discriminators

mcphage10mo ago

> "A dolphin is using its fluke to discipline a mermaid by paddling it across the backside."

Hmm.

snug10mo ago

How do you determine how many attempts are made before the results are failing?

mcphage10mo ago

It's listed in Purple to the right of the model name.

1 more reply

anton-c10mo ago

They failed but man those snakes are cool. Awesome website!

xixixao10mo ago

Awesome showcase! Fun descriptions. Are there similar sites?

vunderba10mo ago

Thanks! There are definitely other GenAI image comparison sites out there - but I found that the majority of them were more concerned with visual fidelity which IMHO is a less challenging problem than prompt adherence.

This is probably one of the better known benchmarks but when I see Midjourney 7 and Imagen3 within spitting distance of each other it makes me question what kind of metrics they are using.

https://artificialanalysis.ai/text-to-image

zamadatix10mo ago

I love the writing style in this.

mvdtnz10mo ago

The website is broken

vunderba10mo ago

That's unusual - I don't see anything in the logs and perf tests / website speed tests show everything is good. Maybe Cloudflare had a hiccup.

peab10mo ago

great website!

oliwary10mo ago

This demo video of Veo 3 on reddit, featuring a variety of characters talking in different scenarios and accents, is one of the most incredible AI demos I have ever seen: https://www.reddit.com/r/ChatGPT/comments/1krmsns/wtf_ai_vid...

Created by Ari Kuschnir

IanCal10mo ago

Good lord.

I think the change here will be something we've seen with the other modalities. Text was interestingly syntactically correct but nonsense sentences. Then paragraphs but the end of the article would go off the rails. Then the article. Now it's that the creativity of the children's story in question.

Pictures were awful fever dreams filled with eyes but you could kind of see a dog. Then you could see what it was, then decent

Videos were fun that they kind of worked, then surprising it took a few seconds for the panda to turn into spaghetti, then it kept the general style for a decent time.

I see this moving towards the creativity being the major thing, or it having a few general styles (softly lit background for example).

This has mostly all shifted in a very short space of time and as someone who put RBMs on GPUs possibly for the first time (I'm gonna claim it) this is absolutely wild.

Had I seen some of this, say, 6 months ago I'd not have guessed at all bits weren't real.

Workaccount210mo ago

Last night my girlfriend asked me why I kept watching the same bland sounding videos again and again. She came over and watched for a bit, gave a sort of confused laugh of solidarity to something, like "Uhh, why is he so into this? But, ok, I guess..." and then walked away.

It wasn't until I was able to get my jaw off the ground that I told her it was AI. No, not AI like special effects, completely AI.

rtkwe10mo ago

A very important demo video I also found on reddit is this one [0] that's a fairly generic series of action scenes of a raid leading to a gun fight. The individual scenes are look mostly fine, notable exceptions being the muzzle flashes and nonsense guns in a few shots, but the connecting flow is nonsense if you look at it even a little. It has some of the consistency issues that are a bit of a halmark of AI videos, the interior size and layout of rooms and vehicles morphs and shifts from 'shot' to 'shot', they get out of the vehicle twice, etc. The wheels really come off in the more actiony scene though with the pace feeling very plodding for what would be an intense scene in even a moderately competitent human editor. Also some of the 'cops' wind up shooting each other in one scene which was a funny mistake.

[0] https://www.reddit.com/r/ChatGPT/comments/1kru6jb/this_video...

tmaly10mo ago

If you have ever tried to use VEO in Google's AI studio, it lets you upload a starting frame image and ending frame image which is cool.

But they do not allow any people in the image even cartoon depictions of humans. This knee caps a lot of potential usage.

WheelsAtLarge10mo ago

Looks like AI crossed a line. At the very least, one person can do long form documentaries from their basement using VEO 3. There is no need for camera shoots. Yikes.

This reminds me of Pixar's video of an animated lamp 40 years ago. I remember that within 5 years Toy Story came out and changed everything on how animated films were made. Looks to me like we are on our way to doing the same thing with realistic movies.

bathtub36510mo ago

What are they documenting if it’s entirely AI generated images?

1 more reply

marcyb5st10mo ago

Calling it now.

Someone will use AI to make the "AI Killed the Video Star" video. Probably the same guy that made this[1] and other masterpieces.

[1]https://www.youtube.com/watch?v=EICWYazyqu4

epiccoleman10mo ago

I thought you were going to link "Video Killed The YTMND Star" - which gives me quite the dose of nostalgia: https://www.youtube.com/watch?v=D6D9arrHiLE

1 more reply

jjcm10mo ago

It finally feels like the professional tools have greatly outpaced the open source versions. While wan and hunyuan are solid free options, the latest from Google and Runway have started to feel like a league above. Interestingly it feels like the biggest differentiator is editing tools - ability to prompt motion, direction, cuts, or weaving in audio, rather than just pure ability to one shot.

These larger companies are clearly going after the agency/hollywood use cases. It'll be fascinating to see when they become the default rather than a niche option - that time seems to be drawing closer faster than anticipated. The results here are great, but they're still one or two generations off.

echelon10mo ago

> While wan and hunyuan are solid free options, the latest from Google and Runway

The Tencent Hunyuan team is cooking.

Hunyuan Image 2.0 [1] was announced on Friday and it's pretty amazing. It's extremely high quality text-to-image and image-to-image with millisecond latency [2]. It's so fast that they've built a real time 2D drawing canvas application with it that pretty much duplicates Krea's entire product offering.

Unfortunately it looks like the team is keeping it closed source unlike their previous releases.

Hunyuan 3D 2.0 was good, but they haven't released the stunning and remarkable Hunyuan 3D 2.5 [3].

Hunyuan Video hasn't seen any improvements over Wan, but Wan also recently had VACE [4], which is a multimodal control layer and editing layer. The Comfy folks are having a field day with VACE and Wan.

[1] https://wtai.cc/item/hunyuan-image-2-0

[2] https://www.youtube.com/watch?v=1jIfZKMOKME&t=1351s

[3] https://www.reddit.com/r/StableDiffusion/comments/1k8kj66/hu...

[4] https://github.com/ali-vilab/VACE

javchz10mo ago

I think open source still has an important advantage in the pro environment despite being less convenient, and it's the possibility of adding things in between the generation process like control net, and custom loras with new concepts or characters.

Plus in local generation you're not limited by the platform moderation that can be too strict and arbitrary and fail with the false positives.

Yes comfy UI can be intimidating at first vs an easy to use chatgpt-like ui, but the lack of control make me feel these tools will still not being used in professional productions in the short term, but more in small YouTube channels and smaller productions.

MrScruff10mo ago

I don't think this is just about convenience - you're not going to get these results with a 14B video model. I'd much prefer to have something I could hack on in ComfyUI but the open weights models don't compete with this anymore than a 32B LLM competes with Gemini 2.5 Pro for coding. And at least in coding you can easily edit the output from the LLM regardless...

1 more reply

popalchemist10mo ago

Control net etc can be served via API; the intrinsic advantage of open-source is the ability to train and run inference privately.

2 more replies

irq-110mo ago

> the agency/hollywood use cases.

It's for advertising.

doctorpangloss10mo ago

IMO, this is a misconception. For example, in the case of social media display ads (i.e., not the typical Google text ad), most campaigns are "saturation," they only work if the creatives are seen 100+ times by the intended audience and look more or less exactly the same, which is kind of the exact opposite theory that benefits from being able to create unlimited personalized creatives.

Flamentono210mo ago

We already have seen that Opensource can compete which is a lot more than people expected. After all opensource and running huge models?

But what it means, that with time, Opensource will be as good as what commercial offerings now have. Hardware will get cheaper, research is open or delayed open.

colordrops10mo ago

Has anyone cracked the nut of making videos longer than a few seconds though? No one seems to have made any progress on this. This is all nearly worthless until that is addressed.

plokiju10mo ago

I thought that for a while. Until it was pointed out to me that most long videos are made of 6 second shots.

Generating a long video one shot at a time kind of makes sense, as long as there's good consistency between shots

2 more replies

lancekey10mo ago

I would think there’s a major difference in the inference time commute for tools like this. And major providers can spend a lot more (at a loss) on the runtime compute. That’s just a guess though.

mensetmanusman10mo ago

We will know GAI exists when there is no difference, because anything can be coded at any level of quality :)

fooker10mo ago

Well no, humans have 'natural' general intelligence and there's an obvious gap between an expert and a novice at any task.

julianpye10mo ago

An indie film with poor production values, even bad acting can grip you, make you laugh and make you cry. The consistency of quality is key - even if it is poor. The directing is the red thread throughout the scenes. Anything with different quality levels interrupts your flow and breaks your experience. The problem with AI video content at this stage is that the clips are very good 'in themselves', just as LLM results are, but putting them together to let you engage beyond an individual clip will not be possible for a long time. It will work where the red thread is in the audio (e.g. a title sequence) and you put some clips together to support the thread. But Hollywood has nothing to fear at this stage. In addition, remember that visual artists are control freaks of the purest kind. Film is still used because of the grain, not despite it. 24p prevails.

rcarr10mo ago

You might want to look up NeuralViz on YouTube. 180k subscribers. They've been building out an entire cinematic universe using AI video tools. And it's by far the funniest show I've watched in years. So the claim that "let you engage beyond an individual clip will not be possible for a long time" isn't true. People are already doing it.

https://www.youtube.com/@NeuralViz

ikesau10mo ago

I hadn't seen these before, but they're working because of the limitations of the technology.

The format of the shows are mostly clip-based - man on the street, news hour, etc - and obviously the jokes are all written by someone with a good sense of humour.

Not to discount that this is, as you say, an example of someone using AI to successfully create characters and stories that resonate with people. it's just still very much because of a creative human's talent and good taste that it's working.

preommr10mo ago

> "Lurking, Lifting, Licking"

Ok, I went from being pleasantly surprised to breakout laughter at that point.

But I also think this points out a big problem: high-quality stuff is flying under the radar simply because of how much stuff is out there. I've noticed that when faced with a lot of choice, rather than exploring it, people fall back into popular stuff that they're familiar with in a really sad way. Like a lot of door dash orders will be for McDonalds, or people will go back to watching popular series like Friends, or how Disney keeps remaking movies that people still go to see.

wickedsight10mo ago

Since the first GenAI started popping up, many people have glossed over the fact that they are just tools. All the anger from artists and keyboard warriors completely ignored the fact that you still need skill and time to make something good with these tools.

Artists aren't going to be replaced by AI tools being used by me on my iPhone, those artists were already replaced by bulk art from IKEA et al. Artists who reject new tools for being new will be replace by artists who don't. Just like many painters were replaced by photographers.

1 more reply

stevenhuang10mo ago

https://www.youtube.com/watch?v=3XkrhhsV6zg

> You're not the monolith of me!

These other universe memes are too good.

alixanderwang10mo ago

This is the first time I've wanted more AI video content. Thanks for sharing.

spaceman_202010mo ago

The Dor Brothers on YouTube have also been making some very funny, stylized music videos with AI. They've managed to use the limitations to their advantage

goosejuice10mo ago

This is news is hilarious.

dlisboa10mo ago

You don't need to make an entire movie out of this. One or two scenes that are difficult or impossible to film on a certain budget is enough to lift the production value of a movie. One can use this as CGI replacement, for example to produce a couple seconds scene of an ancient city and stretch that out with fake panning.

You can also use it as a communication tool such as making a "live" storyboard to prep location, blocking, maybe even as notes for actors.

anton-c10mo ago

That storyboard idea is pretty huge. Imagine dailies go in the other direction - "here's how I want it to look"

1 more reply

doctorpangloss10mo ago

There’s already more good content than anyone can watch. It’s impossible to disentangle strength of the art from strength of distribution. Google, the world’s biggest distributor of culture, is focusing on this problem they do not need to solve, instead of the one everyone in art actually suffers from, because: they’re bad at this. It’s that simple.

sandspar10mo ago

AI video may be to Hollywood as photography was to painting. Photography wasn't "painting, but better" - it was a different thing. AI-native video may not resemble typical Hollywood 3-act structure. But if it takes enough eyeballs away from Hollywood then Hollywood will die all the same.

pedalpete10mo ago

I think you're contradicting your own argument. Painting didn't die from photography.

Photography increased the abstract and more creative aspects of painting and created a new style because photography removed much of the need to capture realism. Though, I am still entranced by realist painting style myself, it is serving different purpose than capturing a moment.

2 more replies

Cthulhu_10mo ago

Hollywood and other "real" films is like the 1% of video content though, as is youtube which has a top 1% of good content and a lot of shit.

AI tools used for any content will / are being used to add to the pile of shit.

DoesntMatter2210mo ago

Most Hollywood and indie films aren't that good. I feel the complete opposite of this comment.

Id much rather start seeing individuals creating AI movies where you aren't bogged down by the need to hire actors and what bot

precompute10mo ago

Sorry, but AI generated video is unwatchable. Even now, when it's really great. It just doesn't seem authentic.

Daub10mo ago

As an artist and designer (with admittedly limited AI experience), where I feel AI to be lacking is in its poverty of support for formal descriptors. Content descriptors such as 'dog wearing a hat' are a mostly solved problem. Support for simple formal descriptors such as basic color terms and background/foreground are ok, but things like 'global contrast' (as opposed to foreground background contrast), 'negative shape', 'overlap', 'saturation contrast' etc etc... all these leave the AI models I have played with scratching their heads.

I like how Veo supports camera moves, though I wonder if it clearly recognizes the difference between 'in-camera motion' and 'camera motion' and also things like 'global motion' (e.g. the motion of rain, snow etc).

Obligatory link to Every Frame a Painting, where he talks about motion in Kurosawa: https://www.youtube.com/watch?v=doaQC-S8de8

The abiding issue is that artists (animators, filmmakers etc) have not done an effective job at formalising these attributes or even naming them consistently. Every Frame a Painting does a good job but even he has a tendency to hand wave these attributes.

1 more reply

Workaccount210mo ago

I'm sure by this point, and if not, pretty soon, everyone will have seen a clip of AI generated video and not thought twice about it.

Its something that is only obvious when it is obvious. And the more obvious examples you see, the more non-obvious examples slip by.

gpt510mo ago

I saw a video today [1]. Millions of views, ten thousand comments, not a single commenter mentioned that it's AI generated.

If you look at the shadows in the background, you can see how they appear and disappear, how things float in the air, and have all the AI artifacts. The video is also slowed down (lower FPS) to overcome the length limit of AI video generator.

But the point is not how we can spot these, because it's going to be impossible, but how the future of news consumption is going to look like.

[1] https://www.tiktok.com/@calm.with.word/video/750583708327412...

jadamson10mo ago

Interesting. Here's the original channel - their videos all have a Picsart watermark in the bottom right.

I don't believe it's entirely fake, just enhanced.

https://www.youtube.com/@jrcollection5246/shorts

kilpikaarna10mo ago

Well, what does news (or any media) consumption look like now? It's been trending towards pure noise for a good while, and this is a way to further automate the generation of yet more noise.

xarope10mo ago

the detail around the eyes is a dead-giveaway for AI generated video

alkonaut10mo ago

The inverse follows from this and is even more scary. Soon there will be videos of something terrible happening, reported in the news, which is widely rejected as fake despite being real.

carlosdp10mo ago

Wow, this is incredible work! Blown away at how well the audio/video matches up, and the dialogue is better sounding / on-par with dedicated voice models.

aaroninsf10mo ago

Funny but also illustrative issue:

in the owl/badger video, the owl should fly silently.

This is an interesting non-trivial problem of generalization and world-knowledge etc., but also?

There's something somewhat sad about that slipping through; it makes me think, *no one involve in the production of this video, its selection, it passing review... etc., seemed to realize that it is one of the characteristic things about owls that you don't hear their wings.

We have owls on our hill right now and see them almost every day and regularly seem them fly. It's magic, especially in an urban environment.

thangalin10mo ago

The silent flight of an owl (BBC):

https://www.youtube.com/watch?v=-WigEGNnuTE

Longer version:

https://www.youtube.com/watch?v=-3ZnrhPtER8

cynicalpeace10mo ago

Basic principles:

1. People like to be entertained.

2. NeuralViz demonstrates AI videos (with a lot of human massaging) can be entertaining

To me the fundamental question is- "will AI make videos that are entertaining without human massaging?"

This is similar to the idea of "will AI make apps that are useful without human massaging"

Or "will AI create ideas that are influential without human massaging"

By "no human massaging", I mean completely autonomous. The only prompt being "Create".

I am unaware of any idea, app or video to date that has been influential, useful or entertaining without human massaging.

That doesn't mean it can't happen. It's fundamentally a technical question.

Right now AI is trained on human collected data. So, technically, It's hard for me to imagine it can diverge significantly from what's already been done.

I'm willing to be proven wrong.

The Christian in me tells me that Humans are able to diverge significantly from what's already been done because each of us are imbibed with a divine spirit that AI does not have.

But maybe AI could have some other property that allows it to diverge from its training data.

nrjames10mo ago

This is technically impressive and I commend the team that brought it to life.

It makes me sad, though. I wish we were pushing AI more to automate non-creative work and not burying the creatives among us in a pile of AI generated content.

swalsh10mo ago

I think the non-creative work is coming... but it's harder, needs more accuracy, and just generally takes more effort. But it's 100% coming. AI today can one shot with about 80% perfection. But for use cases that need to be higher than that, that last 20% is grueling to gain. It's like taking a jet across the country, and then getting jammed in traffic while you're taking a taxi to your hotel.

TechDebtDevin10mo ago

80% on todo apps

dyauspitr10mo ago

There’s limited training data for physical movements. Once there is enough of that the non creative space will start getting their own LLMs.

ahmedfromtunis10mo ago

The amount of gatekeeping I see when this topic is brought is outstanding! Why can't people be happy that more individuals would be soon able to create freely in a more accessible way?

Personally I can't wait to see the new creative doors ai will open for us!

alpaca12810mo ago

How is the requirement to use a computer and maybe pay a cloud subscription in the long term more accessible than other kinds of art? Which individuals are gatekept exactly? Before you bring up disabled people (as often happens when the term accessibility is used), know that many of them are not happy to be used as a shield for this without ever being asked and would rather speak for themselves.

I've tried AI image generation myself and was not impressed. It doesn't let me create freely, it limits me and constantly gravitates towards typical patterns seen in training data. As it completely takes over the actual creation process there is no direct control over the small decisions, which wastes time.

Edit: another comment about a different meaning of accessibility: the flood of AI content makes real content less accessible.

5 more replies

_DeadFred_10mo ago

Remember how there were all those cake shows all of a sudden, and they were making cakes that looked super pretty, but they were just fondant and sheet cakes? We're not thrilled having to wade through the AI equivalent.

duped10mo ago

> Why can't people be happy that more individuals would be soon able to create freely in a more accessible way?

The gates are wide open for those that want to put in effort to learn. What AI is doing to creative professionals is putting them out of a job by people who are cheap and lazy.

Art is not inaccessible. It's never been cheaper and easier to make art than today even without AI.

> Personally I can't wait to see the new creative doors ai will open for us!

It's opening zero doors but closing many

---

What really irks me about this is that I have _seen_ AI used to take away work from people. Last weekend I saw a show where the promotional material was AI generated. It's not like tickets were cheaper or the performers were paid more or anything was improved. The producers pocketed a couple hundred bucks by using AI instead of paying a graphic designer. Extrapolate that across the market for arts and wonder what it's going to do to creativity.

It's honestly disgusting to me that engineers who don't understand art are building tools at the whims of the financiers behind art who just want to make a bit more money. This is not a rising tide that lifts all ships.

3 more replies

gamblor95610mo ago

There's nothing creative in having someone or something else doing the work for you.

"Creating" with an AI is like an executive "inventing" the work actually done by their team of researchers. A team owner "winning" a game played by the their team.

That being said, AI output is very useful for brainstorming and exploring a creative space. The problem is when the brainstorming material is used for production.

5 more replies

ionwake10mo ago

if I wasnt around to witness it over the last 10 years I would have thought most commenters on HN were bots pretending to be offended and gatekeeping for obscure profit motives.

So the bad news is people are just insecure, jealous, pedantic, easy to offend, highly autistic - and these are the smart ones.

The good news, is with dead internet theory they will all be replaced with bots that will atleast be more compelling make some sort of sense.

3 more replies

lm2846910mo ago

"happy", "free", "creative", "accessible"

What a weird way to spell "give $200 a month to google"

kranke15510mo ago

Individuals won’t be able to do anything. The artist here is the LLM. There is no AI art where the human in the loop carries any significance. Proof of that is you can’t replicate their work using tbt same LLM. In AI art, the AI is the artist. The human is just a client making a request.

And who owns the AI?

It’s delusional. Stop falling for the mental jiu Jitsu from the large AI labs. You are not becoming an artist by using a machine to make art for you. The machine is the artist. And you don’t own it.

StefanBatory10mo ago

Creative? There's nothing creative in it.

2 more replies

ehsankia10mo ago

> burying the creatives among us in a pile of AI generated content.

Isn't the creativity in what you put in the prompt? Isn't spending hundreds of hours manually creating and rigging models based on existing sketch the non-creative work that is being automated here?

hollowturtle10mo ago

How does a prompt describe creativity? It's a vision so far off that it's so frustrating because greater creativity came from limited tools, greater creativity came from imperfections, a different point of view, love, a slightly off touch of a painter or a guitar player, the wood of the instrument and the humidity affecting. I can go on and on, prompts are a reduction to the minimum term of everything you'd want to describe, no matter how much you can express via a prompt

3 more replies

dktp10mo ago

For better or worse, a big chunk (if not most) of the AI development probably does go into non-creative work like matching ads against users and ranking search results

It's just not what gets the exciting headlines and showcases

sarks_nz10mo ago

Distribution of art (particularly digital) is a recent phenomenon. Prior to that, art in human history was one-off. Are we just going back to that time?

Similarly with music, prior to recording tech, live performance was where it was at.

You could look at the digital era as a weird blip in art history.

mindwok10mo ago

It's definitely coming. Creative work is first because there's zero constraints on it. Doing non-creative work, you're bound to hit a constraint - real world or otherwise - immediately, and AI is only just starting to navigate that.

owlboy10mo ago

Data for the non creative work isn’t as easy to, uh, “obtain” from others without their consent.

skepticATX10mo ago

The fact that so many feel the same way about this technology (I do too!) is an indictment of humanity, not the technology itself.

We _could_ use this to empower humans, but many of us instinctively know that it will instead be used to crush the human spirit. The end result of this isn’t going to be an expansion of creative ability, it’s going to be the destruction of creative jobs and the capture of these creative mediums by a few large companies.

weatherlite10mo ago

> The end result of this isn’t going to be an expansion of creative ability, it’s going to be the destruction of creative jobs and the capture of these creative mediums by a few large companies.

I agree , but that's the negative. The positive will be that almost any service you can imagine (medical diagnosis, tax preparation, higher education) will come down to zero, and with a lag of perhaps a decade or two it will meet us in the physical world with robo-technicians, surgeons and plumbers. The cost of building a new house or railway will plummet to the cost of the material and the land, and will be finished in 1/10 of the time it takes today. The main problem to me is that there's a lag between the negatives and the positives. We're starting out with the negatives and the benefits may take a decade or two to reach us all equally.

2 more replies

sekai10mo ago

> We _could_ use this to empower humans, but many of us instinctively know that it will instead be used to crush the human spirit. The end result of this isn’t going to be an expansion of creative ability, it’s going to be the destruction of creative jobs and the capture of these creative mediums by a few large companies.

The same was said about the camera or photoshop.

1 more reply

dsadfjasdf10mo ago

you act as if the human population has no agency to choose what they want? This will be another tool for good and bad. People will make beautiful things the world hasn't seen before, and others will use it for propaganda. just like all things we touch

yieldcrv10mo ago

I’m a creative and I’m really glad that more people can express themselves

Just wanted to add representation to that feeling

lilwobbles10mo ago

Expressing themselves by generating boilerplate content?

Creativity is a conversation with yourself and God. Stripping away the struggle that comes with creativity defeats the entire purpose. Making it easier to make content is good for capital, but no one will ever get fulfillment out of prompting an AI and settling with the result.

2 more replies

StefanBatory10mo ago

They always could express themselves.

1 more reply

toenail10mo ago

Do you find it sad that people can use recordings, and don't have to hire musicians any more?

lm2846910mo ago

That's step 1, we're at step 100, it looks like that now:

https://www.youtube.com/shorts/rtxJ0t8Cf6g

https://www.youtube.com/shorts/wjaSPHRNfjQ

hollowturtle10mo ago

Recordings of who? Not only sad but a disaster, I'm sorry but anyone that ever tried to play an instrument seriously knows how much human touch/imperfections come into play, otherwise you're just an anonymous guy playing in a cover band(like the ai will do)

1 more reply

BosunoB10mo ago

Robotics will come in the next few years. If you believe the AI2027 guys, though, the majority of work will be automated in the next 10 years, which seems more and more plausible to me every day.

hooverd10mo ago

Are you independently wealthy enough to benefit from that or someone who should invest in suicide pills for themselves and their family if that day comes?

2 more replies

cadamsdotcom10mo ago

Plenty of non-creative work can be automated.

Have a look at the workflow and agent design patterns in this video by youtuber Nate Herk when he talks about planning the architecture:

https://m.youtube.com/watch?v=Nj9yzBp14EM

There’s less talk about automating non-creative work because it’s not flashy. But I can promise it’s a ton of fun, and you can co-design these automations with an LLM.

golol10mo ago

Multimodal LLMs are currently the natural research step towards AGI robots that can do mundane non-creative work. I believe this is just the reality of the situation. If you can generate a video of a robot doing the dishes then your model understands the physical world quite well. That should be useful for robot control.

ugh12310mo ago

This kind of tech will open up filmmaking to a much wider base of creative talent.

ivape10mo ago

Dude.

Making a movie is not accessible to most people and it's EVERYONES dream. This is not even there yet, but I have a few movies I need to make and I will never get a cast together and go do it before I die. If some creatives need to take a backseat so a million more creatives can get a chance, then so be it.

onemoresoop10mo ago

Yeah, there will be so many AI generated videos that many will go unwatched. Not sure where this is heading but it's certainly an interesting future.

1 more reply

seydor10mo ago

What is non-creative work? I think the term reeks of elitism. Every job is creative, even picking up garbage can become an art when one puts effort in it.

There is a more sensical distinction between work that is informational in nature, and work that is physical and requires heavy tools in hard-to-reach places. That's hard to do for big tech, because making tests with heavy machinery is hard and time consuming

jb199110mo ago

> even picking up garbage can become an art when one puts effort in it.

good lord. talk about pedantic.

woah10mo ago

It makes me sad that the US and western Europe which have been the most flexible and forward-thinking societies in the world for generations have now memed themselves into fretting and hand-wringing about technical advances that are really awesome. And for what? The belief that illustration and filmmaking which have always been hobbies for the vast majority of participants should be some kind of jobs program?

dmonitor10mo ago

People aren't looking forward to companies playing the "how much sawdust can you put in a rice crispy before people notice the difference" experiment on the entertainment industry. The quality of acting, scripting, lighting, and animation in the film/television industry already feels second rate to stuff being made before 2020. The cost cutting and gutting of cultural products is becoming ridiculous, and this technology will only be an accelerant.

1 more reply

kapildev10mo ago

Google has partnered with Darren Aronofsky’s AI-Driven Studio Primordial Soup. I still don't understand why SAG-AFTRA's strike to ban AI from Hollywood studios didn't affect this new studio. Does anyone know?

cjkaminski10mo ago

Primordial Soup isn't a guild signatory, which means they aren't bound by the agreement negotiated during the strike. It also means they cannot hire guild actors for their projects, but that isn't a likely concern given the nature of the company.

anilgulecha10mo ago

I'd made a prediction/bet a month ago, predicting 6 months to a full 90 minute movie by someone sitting on their computer. [0]

The pace is so crazy that was an over estimation! I'll probably get done in 2. Wild times.

0: https://www.linkedin.com/feed/update/urn:li:activity:7317975...

DoesntMatter2210mo ago

It's doable now. Someone just needs to do it. With voice now it's completely doable. Just throw it all together add some effects and you've got a great movie... In theory

jb199110mo ago

It's not a theory, at Cannes a feature movie has premiered that is generated entirely by AI. Made in Spain.

2 more replies

rtkwe10mo ago

There's still a lot of work to be done. It's good at making short individual scenes but when you start trying to string them together the wheels start to come off a lot. This [0] pretty basic police raid leads to shootout video for example turns to mush pretty quick because even in the initial car ride the interior of the car's size and shape warps pretty drastically.

Feels like there's going to be a dichotomy where the individual visuals look pretty good taken by themselves but the story told by those shots will still be mushy AI slop for a while. I've seen this kind of mushy consistency hold up over the generations so far, it seems very difficult to remove becasue it relies on more context than just previous images and text descriptions to manage.

[0] https://www.reddit.com/r/ChatGPT/comments/1kru6jb/this_video...

jonplackett10mo ago

Has anyone actually tried Veo3 and know if it’s as good as this looks?

The demo videos for Sora look amazing but using it is substantially more frustrating and hit and miss.

gpt510mo ago

Here is a twitter user that is posting videos generated with Veo3 (watch unmuted):

https://x.com/fofrAI

wingspar10mo ago

So what the copyright situation going to be in an ai generated movie?

My last recollection is recent case said AI generated didn’t have copyright?

BeFlatXIII10mo ago

I hope no copyright. Ideas are meant to be freely copied.

ssijak10mo ago

Older people on social networks are cooked. I mean in general, we are entering an age where making scams and spreading false news will be easily done with 10$ of credits.

asl2D10mo ago

Yeah i fear that too, my grandma is already sending me links of AI animals that she thinks is real, and the horrible/beautiful art of facebook memes/holiday cards, seems to be completely overtaken by AI. We know that full fake video of you with your own voice asking for something or even interacting on a video call is basically solved problem. Prime time to reestablish and confirm trusted channels with the people you care about.

Workaccount210mo ago

Recently at a family dinner we established that any kind of unsolicited contact that falls outside typical conversation - Asking for money, sending money, pretty much anything with money - you must say the word that only people in our family would know.

Ironically, this would be a good application of AI, where the AI listens in on their calls, and will flag conversation that warrants the keyword being said.

elzbardico10mo ago

Got a bit of an uncanny valley feeling with the owl and the old man videos. And the origami video give me a sort of sinister feeling, seemed vaguely threatening, agressive.

benlivengood10mo ago

We've made so much progress in the last 20 years; it used to take huge teams of developers and artists and giant compute clusters and rendering time to generate uncanny valley!

Now it just takes giant compute clusters and inference time.

thinkingtoilet10mo ago

The owl one had that glow that so many AI images have for some reason. The man was very impressive to me.

jjcm10mo ago

Lower on the page there's a knitted characters version that feels much better. It seems like for some of these, divorcing yourself from reality a little bit helps avoid the uncanny valley.

vjerancrnjak10mo ago

It's a reflection of yourself.

Origami for me was more audio than video. Felt like it's exactly how it would sound.

TheAceOfHearts10mo ago

I tried Whisk to generate images which I then animated, thinking it would be using the newest model. But then I noticed that Veo 3 and Imagegen 4 are only usable through Flow, and only if you're on the most expensive plan. AI Studio also only shows Imagegen3 and Veo2 as media generating options.

My main issue when trying out Veo 2 was that it felt very static. A couple elements or details were animated, but it felt unnatural that most elements remained static. The Veo 3 demos lack any examples where various elements are animated into doing different things in the same shot, which suggests that it's not possible. Some of the example videos that I've seen are neat, but a tech demo isn't a product.

It would be really cool if Google contracted a bunch of artists / directors to spend like a week trying to make a couple videos or short movies to really showcase the product's functionality. I imagine that they don't do that because it would make the seams and limitations of their models a bit too apparent.

Finally, I have to complaint that Flow claims to not be available in Puerto Rico: "Flow is not available in your country yet." Despite being a US territory and being US citizens.

Workaccount210mo ago

You can use imagen 4 in vertex ai. But no Veo 3.

Also Google is going to have to tread carefully, people in the entertainment industry are already AI hostile, and they dictate a surprising amount of public opinion.

dsadfjasdf10mo ago

You can use veo 2 for free in the google ai dashboard. like 5 a day

arduinomancer10mo ago

I can definitely see this being used for lower end advertising

I’ve noticed ads with AI voices already, but having it lip synced with someone talking in a video really sells it more

gloosx10mo ago

>>models create, empowering artists to bring their creative vision

Interesting logic the new era brings: something else creates, and you only "bring your vision to life", but what it means is left for readers questioning, your "vision" here is your text prompt?

Were at a crossroads where the tools are powerful enough to make the process optional.

That raises uncomfortable questions: if you don’t have to create anymore, will people still value the journey? Will vision alone be enough? What's the creative purpose in life? To create, or to to bring creative vision to life? Isn't the act of creation is being subtly redefined?

dmonitor10mo ago

It's being redefined in such a way that 2-3 very large entities get to hold the means of production. It's a very convenient redefinition for them.

1 more reply

klabb310mo ago

> but what it means is left for readers questioning, your "vision" here is your text prompt?

Right. Imo you have to be imagination handicapped to think that creative vision can be distilled to a prompt, let alone be the medium a creative vision lives in its natural medium. The exact relation between vision, artifact, process and art itself can be philosophically debated endlessly, but, to think artifacts are the only meaningful substrate at which art exists sounds like an dull and hollowed-out existence, like a Plato’s cave level confusion about what is the true meaning vs the representation. Or in a (horrible) analogy for my fellow programmers, confusing pointers to data with the data itself.

hooverd10mo ago

LLM providers want to a) make you dependent on their services as you outsource your skills and cognition and b) use that dependency to skim the cream off every economic activity.

oblio10mo ago

> b) use that dependency to skim the cream off every economic activity.

Exactly. Probably the most important quote of modern times is, I think it was a CEO of an ISP that said it: "we don't want to be the dumb pipes" (during a comparison with a water utility company).

Everyone wants to seek rents for recurring revenue someone else actually generates.

kkarakk10mo ago

we can see what happened to opera/theater/hand drawn art as conclusive answer. humans move on to the newer more easier to create/consume thing in general (digital music/tv/digital art) and a small percentage of people treat the older mode of creation as high art coz it's more difficult and expensive to learn / implement.

2 more replies

lxe10mo ago

I've been doing AI art since 2022 and I'm still both disappointed and not quite surprised that this still is a pervasive view of what it takes to create anything high quality using AI.

If you take any high quality AI content and ask their creator what their workflow is, you'll quickly discover that the complexity and nuance required to actually create something high-quality and something that actually "fulfills your vision" is incredibly complex.

Whether you measure quality through social media metrics, reach, or artistic metrics, like novelty or nuance, high quality content and art requires a good amount of skill and effort, regardless of the tool.

Standard reading for context: https://archive.org/details/Bazin_Andre_The_Ontology_of_Phot...

2 more replies

tintor10mo ago

Text prompts are very short now, but that can quickly change if prompt following improves.

Software Engineers bring their vision to life through the source code they input to produce software, systems, video games, ...

Imnimo10mo ago

>Imagen 4 is available today in the Gemini app, Whisk, Vertex AI and across Slides, Vids, Docs and more in Workspace.

I'm always hesitant with rollouts like this. If I go to one of these, there's no indication which Imagen version I'm getting results from. If I get an output that's underwhelming, how do I know whether it's the new model or if the rollout hasn't reached me yet?

cubefox10mo ago

Indeed. At the bottom of their Imagen page, they link to Google AI Studio:

https://aistudio.google.com/generate-image

But this still says it's Imagen 3.0-002, not Imagen 4.

matsemann10mo ago

Yes, Google is so, so, so bad at this. I even struggle with gemini often telling me it can't make images, until I tell it that it can, and then it does. I have no idea what's really supposed to be supported or not in gemini.

It is so confusing. Ok, I got gemini pro through workspace or something, but not everything is there? Sure, I can try aistudio, flow, veo, gemini etc to figure out what I can do where, but so bad UX. Just tried using gemini to create an image, definitely not the newest imagegen as the text was just marbled up. But I can't see which version I'm on, genious.

Edit: After clicking through lots of google products I'm still not able to find a single place I can actually try the new imagegen, despite the article claiming it's available today in X,Y,Z

minimaxir10mo ago

Google is typically upfront about which model versions you're using in those tools. Not as behind-the-scenes as ChatGPT.

However, looking at the UI/UX in Google Docs, it's less transparent.

sech842010mo ago

First test is... very confusing - https://x.com/Seancheno/status/1925049073230372980

afroboy10mo ago

Can we talk about the elephant in the room, porn and i mean the weird and dangerous one? that moment in history of AI is going to happen and when it did shit will hit the fan.

Flamentono210mo ago

AI porn already exist.

Im pretty sure kid/child ai porn already exist somewhere. But i'm quite lucky despite knowing rotten.com and plenty of other sides, never having seen real so i doubt i will see fake child porn.

Whats the elephant in the room now? Nothing changed. Whoever consumes real will consume fake too. FBI/CIA will still try to destroy cp rings.

We could even think it might make this situation somehow better because they might consume purely virtual cp?

2 more replies

stavros10mo ago

Finally, the single most powerful force on earth comes for child sexual abuse: It'll be much cheaper to use AI than to abuse actual children.

We should all be hoping AI-generated CSAM floods the CSAM market, instead of trying to restrict AI so that we artificially prop the market up and cause harm to many more humans.

nomdep10mo ago

AI-generated porn cannot be “dangerous” for the same reason a dream cannot be dangerous: they are not real, no matter how weird it is

asl2D10mo ago

Do you think anybody will really care? People were generating CSAM basically as soon as image generation become accessible. And for the less dangerous stuff situation is way more rampant already, both in free and commercial way.

1 more reply

HamsterDan10mo ago

You should have AI start writing your comments for you so at least then they'll make sense.

baxtr10mo ago

A whale coming out of the street in Manhattan, a women with a Jellyfish belly walking in the woods.

Why is it that all these AI concept videos are completely crazy?

matsemann10mo ago

There is a point in that since you don't know how these really should look you can't really judge them on small idiosyncrasies, and hence you get a better impression compared to uncanny valley if it's something common.

However, I also think this is to show that it can create anything, not just copies of stuff it has seen. If you ask for a painting of a woman and it shows you mona lisa, that's not very impressive.

kypro10mo ago

If the concept is unrealistic your mind will be more forgiving to unrealisms. But if it's suppose to be photo-realistic, you'll be hyper-critical.

rafaelmn10mo ago

I'm going to go out on a limb and say because it's easiest to take whatever comes out looking interesting and sell it as a vibe ?

Like if you asked a model to help you create a coffeeshop website for a demo, it started looking more like sex shop, you just vibe with it and say that's what you wanted in the first place. I've noticed that the success rate of using AI is proportional to much you can gaslight yourself.

jader20110mo ago

I'm surprised no one has yet to mention the use of the name "Flow", which is also the title of the 2025 Oscar winning animated movie, built using Blender. [1]

This naming seems very confusing, as I originally thought there must be some connection. But I don't think there is.

[1] https://news.ycombinator.com/item?id=43237273

maldie10mo ago

For sure seems they are likely deliberately riding on the fame of the movie. I too instantly thought it is some kind of Flow movie animation collaboration similarily like Flow is represented in Blender 4.4 splash screen or is even their mascot.

imp0cat10mo ago

Would Google really stoop so low and try to use the success of the movie to prop their AI video generator tool?

But then again, the do no evil motto is long gone, so I guess anything goes now?

2 more replies

numpad010mo ago

I came across some online threads sharing LoRA models the other day - and it seemed that a lot of generative AI users seem to share models that are effectively just highly specialized fixed function filters for existing (generated)images?

The obvious aim of these foundational image/movie generation AI developments is for these to become the primary source of values at cost and quality unparalleled by preexisting human experts, while allowing but not necessitating further modifications by now heavily commoditized and devalued ex-professional editors at downstream to allow for their slow deprecation.

But the opposite seem to be happening: better data are still human generated, generators are increasingly human curated, and are used increasingly closer to the tail end of the pipeline instead of head. Which isn't so threatening nor interesting to me, but I do wonder if that's a safe, let alone expected, outcome for those pushing these developments.

Aren't you welding a nozzle onto open can of worms?

cryptoegorophy10mo ago

For anyone with an access, can you ask it to make a pickup truck drive through mud? I’ve tested various different AIs and they all suck with physics and tires spinning wrong way, it is just embarrassing. Demos look amazing, but when it comes to actual use - there is none that worked for me. I guess it is all to increase “investor value”

roskelld10mo ago

Google posted a video of their own of an off-roader going through mud.

https://www.youtube.com/watch?v=SPF4MGL7K5I

Obviously we don't know how hand picked that is so it would be interesting to see a comparison from someone with access.

lelandbatey10mo ago

I think Google's got something going wrong with their usage limits, they're warning I'm about to hit my video limit after I gave two prompts. I have a Google AI Pro subscription (came free for 1 year with a phone) and I logged into Flow and provided exactly 2 prompts. Flow generated 2 videos per prompt, for a total of 4 videos, each ~8 seconds long. I then went to the gemini.google.com interface, selected the "Veo 2" model, and am now being told "You can generate 2 more videos today".

Since Google seems super cagey about what their exact limits actually are, even for paying customers, it's hard to know if that's an error or not. If it's not an error, if it's intentional, I don't understand how that's at all worth $20 a month. I'm literally trying to use your product Google, why won't you let me?

itissid10mo ago

Who is doing all the work of making physical agents that can behave as good as a UBI generator? Something that can not just create videos, but go get groceries(hell grow my food), help a construction worker lay down tiling, help a nurse fetch supplies.

https://www.figure.ai/ does not exist yet, at least not for the masses. Why are Meta and Google just building the next coder and not the next robot?

Its because those problem are at the bottom of the economic ladder. But they have the money for it and it would create so much abundance, it would crash the cost of living and free up human labor to imagine and do things more creatively than whatever Veo 4 can ever do.

BosunoB10mo ago

There are companies working on this, but my understanding is that the training data is more challenging to get because it involves reinforcement learning in physical space.

In the forecast of the AI-2027 guys, robotics come after they've already created superintelligent AI, largely just because it's easier to create the relevant data for thinking than for moving in physical space.

pj_mukh10mo ago

Welcome to the defining paradox of the 21st century:

https://en.wikipedia.org/wiki/Moravec%27s_paradox

throwaway31415510mo ago

I think I have a similar distaste for Google as you, but it's just due to limitations in the (bleeding edge...) technology. There's not like a conspiracy to _not_ make a "UBI generator" - which is surely not possible with current technology and won't be for awhile however hard Google might try.

tianshuo10mo ago

Feel free to test imagen 4 on this benchmark: https://github.com/tianshuo/Impossible-AIGC-Benchmark

Ideogram and gpt4o passes only a few, but not all of them.

Animats10mo ago

The ad for Flow would be much better if they laid off the swirly and wavy effects, and focused on realism.

Soon, you should be able to put in a screenplay and a cast, and get a movie out. Then, "Google Sequels" - generates a sequel for any movie.

dimal10mo ago

The swirly effects are probably used to distract from the problems of getting realism right.

FirmwareBurner10mo ago

>Soon, you should be able to put in a screenplay and a cast, and get a movie out.

This "fixes" Hollywood's biggest "issues". No more highly paid actors demanding 50 million to appear in your movie, no more pretentious movie stars causing dramas and controversies, no more workers' unions or strikes, but all gains being funneled directly to shareholders. The VFX industry being turned into a gig meatgrinder was already the canary in the coal mine for this shift.

Most of the major Hollywood productions from the last 10 years have been nothing but creatively bankrupt sequels, prequels, spinoffs and remakes, all rehashed from previous IP anyway, so how much worse than this can AI do, since it's clear they're not interested in creativity anyway? Hell, it might even be an improvement than what they're making today, and at much lower cost to boot. So why wouldn't they adopt it? From the bean counter MBA perspective it makes perfect sense.

2 more replies

suddenlybananas10mo ago

Generating banal stock footage is wildly different than generating a film.

1 more reply

colesantiago10mo ago

Definately plausible.

All this is in line with my prediction for the first entirely AI generated film (with Sora or other AI video tools) to win an Oscar being less than 5 years away.

And we're only 5 months in.

https://news.ycombinator.com/item?id=42368951

4 more replies

esafak10mo ago

AI trailers already exist: https://www.youtube.com/playlist?list=PL_52fVxPZcIiEvGocuVn6...

1 more reply

ericskiff10mo ago

Has anyone gotten access to Imagen 4 for image editing, inpaint/outpaint or using reference images yet? That's core to my workflow and their docs just lead to a google form. I've submitted but it feels like it's a bit of a black hole.

curvaturearth10mo ago

The first video is problematic? the owl faces forwards then seamlessly turns around - something is very off there.

The guy in the third video looks like a dressed up Ewan McGregor, anyone else see that?

I guess we can welcome even more quality 5 second clips for Shorts and Instagram

ravenical10mo ago

Why do people making AI image tools keep showing "pixel art" made with it when the tools are so obviously bad at making it? it's such a basic unforced error

brm10mo ago

I think it's a good thing to have more people creating things. I also think it's a good thing to have to do some work and some thinking and planning to produce a work.

IncreasePosts10mo ago

I don't care about AI animals but the old salt offended me.

skc10mo ago

I'm excited about this.

Think of all of your favorite novels that are deemed "impossible" to adapt to the screen.

Or think of all the brilliant ideas for films that are destined to die in the minds of people who will never, ever have the luck or connections required to make it to Hollywood.

When this stuff truly matures and gets commoditized I think we are going to see an explosion of some of the most mind blowing art.

flmontpetit10mo ago

It's already difficult enough to make a successful book adaptation, even WITH authorial intent. Can't imagine that hours of patchwork AI-generated video, with all its artifacting and consistency errors, will fare any better than "The Rings of Power".

1 more reply

sergiotapia10mo ago

How do you use Imagen 4 in Gemini? I don't see it in the model picker, I just 2.5 Flash and 2.5 Pro (Upgrade).

vunderba10mo ago

It's not at all obvious from Gemini - probably the easiest way is through Whisk.

https://labs.google/fx/tools/whisk

1 more reply

pelagicAustral10mo ago

Have they reveled anything similar to Claude Code yet? I sure hope they are saving that for I/O next month... this video/photo reveals are too gimmicky for my liking, alas I'm probably biased because I don't really have a use for them.

dmd10mo ago

https://jules.google/ posted here today https://news.ycombinator.com/item?id=44034918

1 more reply

lxgr10mo ago

Google I/O is happening right now. This is one of the announcements, I believe.

onlyreal_110mo ago

tbh, wasnt that impressed maybe its cause social media has been heavily marketing out all these things in bulkkk and moreover, at this point, it just feels one company copying what the other released, even the names feel not original?

horhay10mo ago

I generally think that Kling or even Runway has achieved the visual fidelity of Veo (flaws and all, physics problems and direction of action and such), but now people are basically experiencing sensory bias where they think that some things about the visuals make better sense because nw it has sound as an added context. Visually, yeah. Probably on par with Kling, possibly worse on the depction of dynamic action

airstrike10mo ago

On a technical level, this is a great achievement.

On a more societal level, I'm not sure continuously diminishing costs for producing AI slop is a net benefit to humanity.

I think this whole thing parallels some of the social media pros and cons. We gained the chance to reconnect with long lost friends—from whom we probably drifted apart for real reasons, consciously or not—at the cost of letting the general level of discourse to tank to its current state thanks to engagement-maximizing algorithms.

sebau10mo ago

Future is not bright. While we are endlessly talking about details reality is that AI is taken over so many jobs.

Not in 10 years but now.

People who just see this as terrible are wrong. AI improving curves is exponential.

People adaptability is at best linear.

This makes me really sad. For creativity. For people.

mindvirus10mo ago

Maybe. The internet was also exponential, and while it has its drawbacks, I think it's resulted in a huge increase in creativity. The world looks very different than it did 30 years ago, and I think mostly for the better.

jampekka10mo ago

> Future is not bright. While we are endlessly talking about details reality is that AI is taken over so many jobs.

Of course this is not because of AI. It's because of the ridiculous system of social organization where increased automation and efficiency makes people worse off.

elzbardico10mo ago

Time for the Butlerian Jihad

skybrian10mo ago

What’s the easiest way to try out Imagen 4?

Edit: https://labs.google/fx/tools/whisk

celespider10mo ago

I have some base knowledge about diffusion/dit, I am so curious about how this can be done. Do you know some resources in this field? THANKS!

pier2510mo ago

what do they use to train these models? youtube videos?

nico10mo ago

Wow, the audio integrations really makes a huge difference, especially given it does both sounds and voices

Can’t wait to see what people start making with these

nprateem10mo ago

Stability is conspicuously absent from the imagen benchmarks. I assume that means it's significantly better

flakiness10mo ago

How does this compare with sora (pro)?

echelon10mo ago

Sora, the video model, is shit. Kling, Runway, and a whole host of other models are better. You don't have to do much to be better than Sora.

Sora, the image model (gpt-image-1), is phenomenal and is the best-in-class.

I can't wait to see where the new Imagen and Veo stack up.

ugh12310mo ago

When can I change the camera view and have everything stay consistent?

methuselah_in10mo ago

Well all this is great from a technology point of view. But what about millions of jobs in the film industry in animation, motion artists etc? Why is it feeling like few humans are making sure others stop eating and living a good life?

IncreasePosts10mo ago

Do you feel the same way about all the human computers that computers put out of work?

1 more reply

codezero10mo ago

what about all the cartographers, the printing press workers, the stables that tend to the working horses?

Technology is inevitable and it's a tool, advancing technology will always leave people who specialize and are unable to adapt in a bad position, but this won't stop technology from advancing.

I think one could argue this is one of the reasons many people would like their community/government to provide social safety nets for them. It would make specializing less risky in a time when technology advances at a fast pace.

danabenson10mo ago

https://en.wikipedia.org/wiki/Luddite

2 more replies

StefanBatory10mo ago

Thanks to them, we will be able to enter new era of politics. Where nothing is true, and everything is vibe based.

Thank you, researchers, for making our world worse. Thank you for helping to kill democracy.

bowsamic10mo ago

I'm surprised at how bad these are

clarkcharlie0310mo ago

Google's been coooooking

kumarm10mo ago

All my Veo 3 videos has sound missing. No idea why. Seems like a common problem.

rvz10mo ago

Well, all the AI labs wanted to "Feel the AGI" and the smoke from Google...

They all got smoked by Google with what they just announced.

htrp10mo ago

is it still a waitlist?

99990000099910mo ago

Ehh, really for 20$. Break dancers with no music, people just pop in and out ?

Google what is this?

How would anyone use this for a commercial application.

impalallama10mo ago

Well this is terrifying

matthewaveryusa10mo ago

"The Bloomberg terminal for creatives"

_ncuy10mo ago

Google hit the jackpot with their acquisition of YouTube and it's now paying dividend. YouTube is the largest single source of data and traffic on the Internet, and it's still growing fast. I think this data will prove incredibly important to robotics as well. It's a shame they sold Boston Dynamics in one of their dumbest ever moves because of bad PR.

brunoborges10mo ago

"Growing fast" is questionable these days.

There is an ever growing percentage of new AI-generated videos among every set of daily uploads.

How long until more than half of uploads in a day are AI-generated?

3 more replies

doctorpangloss10mo ago

On the other hand, take one look at the way they caption a video in their dataset, and you have seen like 90% of the "secret sauce" of generative art. All this supposed data and knowledge, and anyone who has worked 1 day on Imagen or Veo could become a serious competitor.

The remaining 10% is the solution to generating good hands, of course. And do you think YouTube has been helping anyone achieve that?

qoez10mo ago

I hear BD aren't making much money anyway so I wonder if they couldn't just buy them back for not much loss overall.

informal00710mo ago

Why videos are important for robotics?

2 more replies

mrklol10mo ago

Why should YouTube be here at the advantage? Every competitor also has access to these videos(?)

5 more replies

seydor10mo ago

Most youtube videos use stock video photography. Or the face of some youtuber.

If we look at the Veo 3 examples, this is not the typical youtube video, but instead they seem to recreate cgi movies, or actual movies.

phh10mo ago

Of course they had to name a film making proprietary tool with the name of an award winning film made using open-source tools released less than a year ago...

paxys10mo ago

"Flow" is one of the most generic names in tech. I can think of 10+ products called that off the top of my head.

1 more reply

debugnik10mo ago

I still remember a style transfer paper which proudly mimicked a popular artist who had passed away barely a few years before (Qinni). Many AI researchers seemingly want to wear the skins of the people they rip off.

woah10mo ago

Seems pretty obvious that they named it after Facebook's JS type checker from 2015

quantumHazer10mo ago

Like most AI image or video generation tools, they produce results that look good at first glance, but the more you watch, the more flaws and sloppiness you notice, and they really lack storytelling

harikb10mo ago

They don't have to be as good as the best film production team - they just need to be better than the average/B-grade ones to gain adoption.

With the media & entertainment hungry world which is about to get worse with the unempoyed/underemployed tiktok generation needing "content", something like this has to have a play.

2 more replies

nine_k10mo ago

But you don't have to outsource 100% of your creative work to your tools. This is a toolbox, not a complete automatic masterpiece generator. If you want serious production, don't remove yourself from the loop.

Drive the storytelling, consult with AI on improving things and exploring variations.

Generate visuals, then adjust / edit / postprocess them to your liking. Feed the machine your drawings and specific graphic ideas, not just vague words.

Use generated voices where they work well, record real humans where you need specific performance. Blend these approaches by altering the voice in a recording.

All these tools just allow you to produce things faster, or produce things at all such that would be too costly to shoot in real life.

onlyrealcuzzo10mo ago

AI used to be quite bad at coding just - what - 2 years ago?

Now it's "good enough" for a lot of cases (and the pace of improvement is astounding).

AI is still not great at image gen and video gen, but the pace of improvement is impressive.

I'm skeptical image, video, and sound gen are "too difficult" for AI to get "good enough" at for many use cases within the next 5 years.

1 more reply

Closi10mo ago

I don’t get how someone can look at these videos and think “wow there’s lots of flaws and it’s sloppy and no storytelling” rather than “holy smokes this stuff is improving fast!”

In 2 years we have moved from AI video being mostly a pipe dream to some incredible clips! It’s not what this is like now, but what will it be like in 10 years!

2 more replies

superb_dev10mo ago

You don’t even have to look close for some of these. The owl suddenly flipping direction in the first video was jarring

5 more replies

jhaile10mo ago

Yea, but we're early days and I think that will go away as the tools get better. Also - did you watch the sample short films they have?

spiderice10mo ago

You should see the terrible results it's possible to generate with AfterEffects, Blender, Houdini, etc..

Lucasoato10mo ago

> Flow is not available in your country yet.

A bit depressing.

lenerdenator10mo ago

I do find myself wondering if the people working on this stuff ever give any real thought to the impact on society that this is going to have.

I mean obviously the answer is "no" and this is going to get a bunch of replies saying that inventors are not to blame but the negative results of a technology like this are fairly obvious.

We had a movie two years ago about a blubbering scientist who blatantly ignored that to the detriment of his own mental health.

bowsamic10mo ago

It's really being forced on us too. Jira, Confluence, and Notion are three products I've used where they've purposefully ignored requests to allow us to disable or hide the bundled generative AI. It's really intrusive. I also switched to Duck Duck Go because of the new AI on Google

tmpz2210mo ago

How could you possibly push back on the societal benefit of a director being able buy a vacation home in Lake Tahoe?

1 more reply

tootie10mo ago

Remember when they fired Timnit Gebru for publishing on AI safety?

2 more replies

ionwake10mo ago

Love flow tv ! Absolutely blown away by the improvements on these models, and also the channel interface was not bad and quite smooth.

I cant be the only one wondering where the swedish beach volleyball channel is though.

crat3r10mo ago

This doesn't look (any?) better than what was shown a year or two ago for the initial Sora release.

I imagine video is a far tougher thing to model, but it's kind of weird how all these models are incapable of not looking like AI generated content. They all are smooth and shiny and robotic, year after year its the same. If anything, the earlier generators like that horrifying "Will Smith eating spaghetti" generation from back like three years ago looks LESS robotic than any of the recent floaty clips that are generated now.

I'm sure it will get better, whatever, but unlike the goal of LLMs for code/writing where the primary concern is how correct the output is, video won't be accepted as easily without it NOT looking like AI.

I am starting to wonder if thats even possible since these are effectively making composite guesses based on training data and the outputs do ultimately look similar to those "Here is what the average American's face looks like, based on 1000 people's faces super-imposed onto each other" that used to show up on Reddit all the time. Uncanny, soft, and not particularly interesting.

ahmedfromtunis10mo ago

It has long been established that Veo has a waaay better understanding of physics, and consistency over multiple frames, than Sora. Not even close.

1 more reply

j / k navigate · click thread line to collapse