For example “find all the interview sections where people are talking about x and make a sequence”.
OpusClip claims to have this but it’s behind a waitlist.
Once that’s there, you can use it for organizing, searching, filtering, or whatever you want. It does not need to be coupled with an LLM-based interface.
ML models for eg face & object recognition have been deployed in both local- and cloud based photo organization for at least a decade. I very much welcome transformers to do a much better job, but I also very much reject the everything-is-a-prompt hammer as a solution to all problems. Especially in deep and professional workflows where details matter.
Yes, this is a big feature I've been working on, should be ready for a beta by the end of the month.
I allude to it in the post, but good search (for editing) is a challenge, and necessitates a mix of embeddings/vector search and text models.
And that’s why I read the comments to see if anyone mentioned it.
To be able to literally take the source files used to put the video together and edit each piece individually would be great.
I wanted to create a car driving down a road covered in arches if greenery. I got lots of great options but I wanted a particular combination of options. If I could do something like that with video, that would be terrific
In my experience, I’ve always listened to live discussions or read long form blog posts, specifically for the story and obscure points being made. Summaries never capture that and always miss nuances.
One guy pays the AI to dig a hole, the other guy pays the AI to fill in the hole. Back and forth they go, raising the BNP but otherwise not accomplishing anything.
It sometimes feels like media has bifurcated into hyper-dense (let me explain a whole field of law in a 30 second tiktok) versus hyper-fluffy (documentary with 30 minutes of material spread out into six episodes, with a recap before and after each commercial break), depending on whether the target audience has a job or not.
Imagine searching for a guide on how to disassemble your laptop. Unfortunately, you can only find a 30 minute video which is full of rambling, ads or other things irrelevant to you. You can at least in theory use AI to produce a textual summary which contains only the disassembly instructions and relevant snapshots of the video.
All professionals I've ever talked to seem to agree that videos are a terrible form of reference information (i.e. you need information to accomplish a task right now).
The same applies to recipe websites: an AI that can throw all the fluff away is useful considering the annoying habit of the authors to seemingly write about everything but ingredients and the steps necessary to cook the dish.
I think this relates to the https://nick.groenen.me/posts/the-4-types-of-technical-docum... as in any documentation that serves immediate work rather than learning should be straight to the point with as little clutter as possible.
Podcasts and blog posts fall into "unique value/view/information I am learning" or entertainment "something that feels like a (parasocial) friend - content I can predictably expect and get some dopamine/sense of socialness from".
Summaries for the former remove the eureka moments and brain connections between ideas, replacing them with takeaways, and summaries for the latter are like summarizing a TV episode in text: no entertainment tends to really come from it.
I think it comes from having many messages at work, and I get that. When you have 50-100 messages/documents a day, quick summaries are a lifesaver, they help you filter, avoid, or get to the facts. But for things I select listening to.. for those hours of rest or (scientific) curiosity in my life.. summaries are not a virtue.
(for Parasocial - the feeling is: This person won't update me on their relationship problems, they'll explain a cool thing about castles to me and share their opinion, etc.)
I think, in general, most people have areas of interest to them where it would not occur to them to summarise what they're having fun engaging with.
E.g. I'm interested in classical art, and come across a lot of "he painted this while he was in $X before he moved to $Y". I'd like information about $X and $Y to be also available, how far apart are they, were they ruled by the same people, etc. But I won't be doing that sort of digging myself, I'd like it to show up next to what I'm reading, because I (will) have an AI reading along and doing this work for me.
This is a genuine concern of mine! I don't want to build something that generates slop.
Rather, I think whenever we change the costs / process of things, new possibilities open up.
As an example, last night I re-watched Starship Troopers for the six-hundredth time. I'm a huge fan of Paul Verhoeven.
What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?
It's tough to predict the future and how things will change.
But I'd rather be participating in its creation, trying to make it better.
this is not interesting whatsoever actually
Is this what you want to do? https://www.youtube.com/watch?v=6sUR6ylVH7E
Always.
It is really humbling to actually try it and realize how difficult making anything original is. You also realize that... you just might not be talented haha.
Imagine taking in a prompt, which describes the video you'd like generated. At render time you pass along variables which get injected to describe the specifics for your audience.
We can then adjust the video edit according to that audience, including mixing generated and non-generated content.