What I've Learned in the Past Year Spent Building an AI Video Editor (opens in new tab)

(makeartwithpython.com)

168 pointsburningion1y ago66 comments

66 comments

35 comments · 9 top-level

35mm1y ago· 17 in thread

As someone who has worked as a video editor, the most helpful AI tool would be prompt based editing.

For example “find all the interview sections where people are talking about x and make a sequence”.

OpusClip claims to have this but it’s behind a waitlist.

klabb31y ago

As an outsider: sounds like the main value lies in the AI extracting detailed and accurate (but heuristic) metadata from video: audio transcriptions, text, people, environment and objects.

Once that’s there, you can use it for organizing, searching, filtering, or whatever you want. It does not need to be coupled with an LLM-based interface.

ML models for eg face & object recognition have been deployed in both local- and cloud based photo organization for at least a decade. I very much welcome transformers to do a much better job, but I also very much reject the everything-is-a-prompt hammer as a solution to all problems. Especially in deep and professional workflows where details matter.

burningionOP1y ago

Author here.

Yes, this is a big feature I've been working on, should be ready for a beta by the end of the month.

I allude to it in the post, but good search (for editing) is a challenge, and necessitates a mix of embeddings/vector search and text models.

liotier1y ago

Derushing in general is the most time consuming, so not only language pattern recognition but also image recognition: "From the rushes, extract all the sequences with bicycle crashes to give me a pile of clips to use in my edit" !

1 more reply

nashashmi1y ago

> I allude to it

And that’s why I read the comments to see if anyone mentioned it.

To be able to literally take the source files used to put the video together and edit each piece individually would be great.

I wanted to create a car driving down a road covered in arches if greenery. I got lots of great options but I wanted a particular combination of options. If I could do something like that with video, that would be terrific

yunohn1y ago

Not a personal jab, but I am astounded how every day, HN is full of discussion around how articles, newsletters, podcasts, and videos need to be aggregated and summarized for actual consumption. Repeat ad infinitum in both directions.

In my experience, I’ve always listened to live discussions or read long form blog posts, specifically for the story and obscure points being made. Summaries never capture that and always miss nuances.

kombookcha1y ago

It's approaching a very strange situation where people make overly wordy and bloated AI generated content and other people try to use AI to compress it back into useful pellets vaguely corresponding to the actual prompts used to generate the initial content. Which were the only bits anybody cared about in the first place.

One guy pays the AI to dig a hole, the other guy pays the AI to fill in the hole. Back and forth they go, raising the BNP but otherwise not accomplishing anything.

1 more reply

pjc501y ago

Not sure about articles, but people keep recommending multi-hour-long podcasts and videos far beyond the ability of any employed person to keep up with what they might want, so a summary is a useful tool to extract the salient points and possibly consider if something meets the threshold of being better than all the other hour-long things I might want to spend my free hour on.

It sometimes feels like media has bifurcated into hyper-dense (let me explain a whole field of law in a 30 second tiktok) versus hyper-fluffy (documentary with 30 minutes of material spread out into six episodes, with a recap before and after each commercial break), depending on whether the target audience has a job or not.

1 more reply

cultureswitch1y ago

I generally agree with you when it comes to learning-focused content but there are definite cases where using an AI summary makes a lot of sense.

Imagine searching for a guide on how to disassemble your laptop. Unfortunately, you can only find a 30 minute video which is full of rambling, ads or other things irrelevant to you. You can at least in theory use AI to produce a textual summary which contains only the disassembly instructions and relevant snapshots of the video.

All professionals I've ever talked to seem to agree that videos are a terrible form of reference information (i.e. you need information to accomplish a task right now).

The same applies to recipe websites: an AI that can throw all the fluff away is useful considering the annoying habit of the authors to seemingly write about everything but ingredients and the steps necessary to cook the dish.

I think this relates to the https://nick.groenen.me/posts/the-4-types-of-technical-docum... as in any documentation that serves immediate work rather than learning should be straight to the point with as little clutter as possible.

1 more reply

authorfly1y ago

I totally agree. What is life living with just summaries?

Podcasts and blog posts fall into "unique value/view/information I am learning" or entertainment "something that feels like a (parasocial) friend - content I can predictably expect and get some dopamine/sense of socialness from".

Summaries for the former remove the eureka moments and brain connections between ideas, replacing them with takeaways, and summaries for the latter are like summarizing a TV episode in text: no entertainment tends to really come from it.

I think it comes from having many messages at work, and I get that. When you have 50-100 messages/documents a day, quick summaries are a lifesaver, they help you filter, avoid, or get to the facts. But for things I select listening to.. for those hours of rest or (scientific) curiosity in my life.. summaries are not a virtue.

(for Parasocial - the feeling is: This person won't update me on their relationship problems, they'll explain a cool thing about castles to me and share their opinion, etc.)

mjburgess1y ago

It has a lot to do with the kinds of articles that appear on HN and across the internet. And also, that spending time on something requires being interested in it, and so, there's a larger audience for summaries.

I think, in general, most people have areas of interest to them where it would not occur to them to summarise what they're having fun engaging with.

reportgunner1y ago

People use these summaries to generate spam which they sell to advertising networks, that's why they keep talking about it.

giancarlostoro1y ago

Thats fair, and there will always be people who want summaries.

exe341y ago

I don't read much online drivel, but the way I would describe my interest in AI summary/model building, is that I do read a few articles/books deeply, but these refer to many other things that it would be useful to have a general picture of in my mind, but I'm never going to put the manual effort into building that surrounding structure.

E.g. I'm interested in classical art, and come across a lot of "he painted this while he was in $X before he moved to $Y". I'd like information about $X and $Y to be also available, how far apart are they, were they ruled by the same people, etc. But I won't be doing that sort of digging myself, I'd like it to show up next to what I'm reading, because I (will) have an AI reading along and doing this work for me.

torginus1y ago

You don't understand! I need to procrastinate more efficiently!

wheatgreaser1y ago

that seems really hard

wk_end1y ago

You should check out scenery.video (disclaimer: I have a relationship with the company)

tylerekahn1y ago

Check out https://kino.ai (YC S23)

1oooqooq1y ago· 4 in thread

did i miss something or this is "video editing was too hard so i just made a Wikipedia reading bot that generates drivel for Instagram and TikTok at the same time"?

burningionOP1y ago

Author here.

This is a genuine concern of mine! I don't want to build something that generates slop.

Rather, I think whenever we change the costs / process of things, new possibilities open up.

As an example, last night I re-watched Starship Troopers for the six-hundredth time. I'm a huge fan of Paul Verhoeven.

What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?

It's tough to predict the future and how things will change.

But I'd rather be participating in its creation, trying to make it better.

scudsworth1y ago

>What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?

this is not interesting whatsoever actually

1 more reply

AlienRobot1y ago

>What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?

Is this what you want to do? https://www.youtube.com/watch?v=6sUR6ylVH7E

j451y ago

There is a high time cost to actual video editing and managing the details well, quite different than generating one template/perception of it, which is more where social media slop is at.

mips_avatar1y ago· 2 in thread

I agree that building AI on top of the video editor is probably a mistake. Maybe the format of the representation of the video can be something better than a series of matrices of pixel values.

arjunaaqa1y ago

Absolutely true ! Re-imagined AI first products will kill AI patched up legacy products.

Always.

brianjking1y ago

I think this is sometimes true, and certainly after a ton of failure first.

lukaqq1y ago· 1 in thread

Impressive blog! I am building a professional web video editor - https://chillin.online and trying to embed various AI workflows into it. Your article has given me a lot of inspiration. Thank you!

b-lee1y ago

Looks so interesting..

Narciss1y ago· 1 in thread

Good work on pushing through. It’s like you say, building anything is an achievement.

ericmcer1y ago

Seriously, every person needs the opportunity to really throw themselves into creating something for a year. I think so many people walk around thinking "if only I had time/money/space/whatever I could do something amazing".

It is really humbling to actually try it and realize how difficult making anything original is. You also realize that... you just might not be talented haha.

sfmike1y ago· 1 in thread

what do you think of this versus the ai that is hiring actors that are then reused as models in the videos via script

burningionOP1y ago

Author here. I imagine that being one of the components you can "plug in" to what I'm building.

Imagine taking in a prompt, which describes the video you'd like generated. At render time you pass along variables which get injected to describe the specifics for your audience.

We can then adjust the video edit according to that audience, including mixing generated and non-generated content.

mcdow1y ago

It seems like we are currently in the "skeuomorphic" product design era for AI products. Which is to say we are building the same products but with AI tacked on. I appreciate that you are approaching this problem from first principles and attempting to break from the model of the previous generation. Kudos.

SCUSKU1y ago

Love the author sharing their winding journey as well as the tools and things they learned along the way. You can tell the author did grow a lot through this process, and through the year. Great stuff, thanks for sharing these great tips :D

Arelius1y ago

Because I don't see it mentioned elsewhere, I wanted to plug OpenTimelineIO, as a lot of the industry is building support around it as a format right now, and it would be great for any new video editor to support.

https://opentimelineio.readthedocs.io/en/stable/

j / k navigate · click thread line to collapse

66 comments

35 comments · 9 top-level

35mm1y ago· 17 in thread

As someone who has worked as a video editor, the most helpful AI tool would be prompt based editing.

For example “find all the interview sections where people are talking about x and make a sequence”.

OpusClip claims to have this but it’s behind a waitlist.

klabb31y ago

As an outsider: sounds like the main value lies in the AI extracting detailed and accurate (but heuristic) metadata from video: audio transcriptions, text, people, environment and objects.

Once that’s there, you can use it for organizing, searching, filtering, or whatever you want. It does not need to be coupled with an LLM-based interface.

burningionOP1y ago

Author here.

Yes, this is a big feature I've been working on, should be ready for a beta by the end of the month.

I allude to it in the post, but good search (for editing) is a challenge, and necessitates a mix of embeddings/vector search and text models.

liotier1y ago

1 more reply

nashashmi1y ago

> I allude to it

And that’s why I read the comments to see if anyone mentioned it.

To be able to literally take the source files used to put the video together and edit each piece individually would be great.

yunohn1y ago

kombookcha1y ago

One guy pays the AI to dig a hole, the other guy pays the AI to fill in the hole. Back and forth they go, raising the BNP but otherwise not accomplishing anything.

1 more reply

pjc501y ago

1 more reply

cultureswitch1y ago

I generally agree with you when it comes to learning-focused content but there are definite cases where using an AI summary makes a lot of sense.

All professionals I've ever talked to seem to agree that videos are a terrible form of reference information (i.e. you need information to accomplish a task right now).

1 more reply

authorfly1y ago

I totally agree. What is life living with just summaries?

(for Parasocial - the feeling is: This person won't update me on their relationship problems, they'll explain a cool thing about castles to me and share their opinion, etc.)

mjburgess1y ago

I think, in general, most people have areas of interest to them where it would not occur to them to summarise what they're having fun engaging with.

reportgunner1y ago

People use these summaries to generate spam which they sell to advertising networks, that's why they keep talking about it.

giancarlostoro1y ago

Thats fair, and there will always be people who want summaries.

exe341y ago

torginus1y ago

You don't understand! I need to procrastinate more efficiently!

wheatgreaser1y ago

that seems really hard

wk_end1y ago

You should check out scenery.video (disclaimer: I have a relationship with the company)

tylerekahn1y ago

Check out https://kino.ai (YC S23)

1oooqooq1y ago· 4 in thread

did i miss something or this is "video editing was too hard so i just made a Wikipedia reading bot that generates drivel for Instagram and TikTok at the same time"?

burningionOP1y ago

Author here.

This is a genuine concern of mine! I don't want to build something that generates slop.

Rather, I think whenever we change the costs / process of things, new possibilities open up.

As an example, last night I re-watched Starship Troopers for the six-hundredth time. I'm a huge fan of Paul Verhoeven.

What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?

It's tough to predict the future and how things will change.

But I'd rather be participating in its creation, trying to make it better.

scudsworth1y ago

>What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?

this is not interesting whatsoever actually

1 more reply

AlienRobot1y ago

>What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?

Is this what you want to do? https://www.youtube.com/watch?v=6sUR6ylVH7E

j451y ago

There is a high time cost to actual video editing and managing the details well, quite different than generating one template/perception of it, which is more where social media slop is at.

mips_avatar1y ago· 2 in thread

I agree that building AI on top of the video editor is probably a mistake. Maybe the format of the representation of the video can be something better than a series of matrices of pixel values.

arjunaaqa1y ago

Absolutely true ! Re-imagined AI first products will kill AI patched up legacy products.

Always.

brianjking1y ago

I think this is sometimes true, and certainly after a ton of failure first.

lukaqq1y ago· 1 in thread

Impressive blog! I am building a professional web video editor - https://chillin.online and trying to embed various AI workflows into it. Your article has given me a lot of inspiration. Thank you!

b-lee1y ago

Looks so interesting..

Narciss1y ago· 1 in thread

Good work on pushing through. It’s like you say, building anything is an achievement.

ericmcer1y ago

It is really humbling to actually try it and realize how difficult making anything original is. You also realize that... you just might not be talented haha.

sfmike1y ago· 1 in thread

what do you think of this versus the ai that is hiring actors that are then reused as models in the videos via script

burningionOP1y ago

Author here. I imagine that being one of the components you can "plug in" to what I'm building.

Imagine taking in a prompt, which describes the video you'd like generated. At render time you pass along variables which get injected to describe the specifics for your audience.

We can then adjust the video edit according to that audience, including mixing generated and non-generated content.

mcdow1y ago

SCUSKU1y ago

Arelius1y ago

https://opentimelineio.readthedocs.io/en/stable/

j / k navigate · click thread line to collapse