I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

438 pointsiliashad12d ago115 comments

TLDR: I had 2,207 GoPro videos, and I need to rewatch them to find interesting moments from my cycling journey. I built a project to index them locally on my M1 Max using open-source ML models, search for those moments, and send the best clips straight to my DaVinci Resolve timeline. I indexed 628 videos (668.68 GB, 15h 13m 18s of footage duration), more details in the metrics table in the last section of this article.

Full article: https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-videos-using-my-m1-max-computer

438 pointsiliashad12d ago115 comments

Full article: https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-videos-using-my-m1-max-computer

115 comments

94 comments · 30 top-level

Beijinger12d ago· 14 in thread

Does it work for porn collections too?

pduggishetti12d ago

You'll need a lora for this, porn content rejection is heavy. Or you'll need a abliterated model, not sure if vision also works.

You might want to add something like yolo finetune to detect scenes + face recognition too.

dotancohen12d ago

For GP's purpose, can face recognition techniques be repurposed for, um, other body parts recognition? Sometimes the actresses are facing away from camera. There are exposed lips, if that helps.

1 more reply

vorticalbox12d ago

Vision still works perfectly fine in abliterated models.

2 more replies

lifestyleguru12d ago

Last time I tried whisper, it hallucinated an elaborate conversation from sounds of slapping and moaning and it took minutes to spit every single line of it.

3eb7988a166312d ago

Parakeet has been trained to detect non-voice sounds and exclude that from identification, so you might have better luck with that family.

dotancohen12d ago

If I remember correctly, the whisper documentation actually recommends to trim non-speech portions as the models halucinate heavily during those portions.

sarjann12d ago

Asking the important questions

fhdkweig12d ago

The internet is for porn. https://www.youtube.com/watch?v=LTJvdGcb7Fs

nntwozz11d ago

I was meandering through the comments about to leave the topic when my interest suddenly piqued upon reading the word porn.

iliashadOP12d ago

Why it’s always the same question? Hahah. I posted my project over Reddit and I got the same one hahah

fennecfoxy11d ago

Ha ha ha, it's because most humans overlap on a few things - like eating, shitting, sleeping and fucking, ha ha ha.

supertroop12d ago

Not sure if you’re being sarcastic but I think this is an interesting question. Would deep seek be useful here since it is local?

fibers12d ago

just because it is local does not mean it wouldn't reject explicit content. you can definitely try and find abilated models and can attempt to use unsloth or something similar to tune it properly.

1 more reply

okr12d ago

Depends how deep you wanna go.

justinram1112d ago· 9 in thread

Something I've enjoyed more than I expected is Google and Apple photos sending me photo memories and compilations of various things in my life and my kids lives over the last decade.

I'm really bullish on taking more video of my kids, with the thought that it will become easier and easier for AI to put them into little compilations I can enjoy later.

iliashadOP12d ago

That’s good to hear, open source ML models are getting better and better. I did a small experiment to generate a Spotify year in review like video here is a preview video https://github.com/IliasHad/edit-mind/tree/expirement/year-i...

mwelpa11d ago

I wish I could connect Apple photos to my Spotify account and have photo memories connected with songs I listened to at the time :)

alias_neo11d ago

Music memories are the best.

I booted up my old PS3 from my uni days (20 years ago?) and found all of the music I had on it because I used it for everything at the time. Some seriously nostalgic music I'd completely forgotten about.

theshrike7910d ago

I think the Apple stuff is done 100% on device.

Google loves scanning stuff on in the cloud though.

goodmythical12d ago

You don't mind Google using your kids to train their models and advertising algorithms?

Years from now they'll be getting "hey look at BIKE BRANDS' NEWEST CHEAP BIKE REMEMBER WHEN YOU USED TO RIDE BIKE BRAND BIKES"

satvikpendem11d ago

I think most people really don't care, and/or will just adblock those sorts of things when they do arrive.

2 more replies

JMiao12d ago

do you use android and ios, or is there another benefit to having personal media with both?

dave808811d ago

I run both on my phone as a lazy (but flawed) backup strategy.

iliashadOP12d ago

Can you please elaborate more?

1 more reply

WarOnPrivacy12d ago· 6 in thread

I was surprised to learn that the

    M1 Max CPU is an ARM/SoC, comparable to an 11th gen Intel i9

Do I have it right? Would Windows ARM performance be similar for those cpu?

ref: https://www.cpubenchmark.net/compare/4585vs4245/Apple-M1-Max...

pachouli-please12d ago

It's also a bit apples (heh) to oranges for a handful of reasons, but most impactful

- "unified" ram makes all the system ram available as VRAM - dedicated ai coaccelerator thingy

Both of these reasons allow the apple silicon chips to crush conventional cpus in these kind of AI model workload stuffs

No idea about what the windows arm stuff is capable of. I know they use Qualcomm snapdragon chips though.

owldown12d ago

“Comparable” is maybe true if we are talking about single core performance, but for memory bandwidth, the M1 Max is about 8 times faster. Wider bus, lower latency, not even close.

voidmain000112d ago

No comparison. M1 Max has 400GB/s RAM bandwidth while Snapdragon X2 Elite, the latest and greatest , has 228GB/s RAM bandwidth.

Rohansi11d ago

I don't disagree with your conclusion but the comparison of max bandwidth between the two SoCs is not enough. Neither of them will use all of that bandwidth doing AI work because the GPU will be compute limited. That's why dedicated GPUs perform so significantly better without having significantly higher bandwidth.

voidmain000111d ago

The question I answered was "Would Windows ARM performance be similar for those cpu?" and the answer is, no, because the RAM bandwidth for ARM SoC computers for Windows, primarily Snapdragon X1E and X2E is half to quarter that of the M1 Max.

1 more reply

iliashadOP12d ago

To your question, I can’t deny or confirm that because I didn’t tried it this project over a Windows machine yet or a machine with this config

tontonius12d ago· 5 in thread

if anyone is interested in searching large video collections local and offline I suggest taking a look at Jumper https://docs.getjumper.io

comes with some nifty features like NLE- integrations, people search, MCP, API etc

Disclaimer: one of the co-founders

dotancohen12d ago

The link just timed out for me. I'm in Israel, connecting via residential WiFi. All other sites that I regularly use connect just fine.

tontonius11d ago

hmm weird works for me.. what about https://getjumper.io/?

dotancohen11d ago

They're both working now.

____tom____11d ago

Your docs say you integrate with Davinci resolve.

Other comments mention davinci resolve has this built in. How would you compare the two?

MaxGL7d ago

Jumper is significantly more accurate

GreenSalem11d ago· 4 in thread

A lawyer I know who specialises in rape, and is excellent at getting the obviously guilty exonerated, lost a case last year because of GoPro videos.

Her client was recording while committing the abhorrent crime. The criminal would otherwise have got off.

From my perspective, the GoPro camera produced a good outcome. Still, one has wonder why anyone to record their criminal actions.

Yiin11d ago

word "her" in this context gave me heavy feelings, what makes one to pick such a career move...

GreenSalem11d ago

Beggars cant be choosers.

She would rather have done corporate law but did not have the academic credentials or the networks needed for a job at the likes of Latham Watkins or White and Case.

Still it is good for society that criminals get the worst lawyers to defend them.

fennecfoxy11d ago

Why? You're being sexist and I hope you can understand why.

djmips11d ago

esjeon12d ago· 3 in thread

> Then, run the frame analysis pipeline, which will divide the video into separate video scenes (1s each, or 1fps) > (…) > Frames analyzed 57,537

Aha, it makes total sense. This number sounds much more reasonable than “669 GB”, since the actual total size of processed frames would be like 10-30 GB.

(Not downplaying anything. Doing-at-home always requires some math on practicality)

> Total compute time 67h 40m 42s

I’m just curious tho — is there any paying options that can accelerate this kind of process? Just spin up GPU instances?

iliashadOP12d ago

> Aha, it makes total sense. This number sounds much more reasonable than “669 GB”, since the actual total size of processed frames would be like 10-30 GB.

The reason why is “669 GB” is the total raw footage size when I'm doing the video processing, I downscaled each frame to 720p to make the video processing much faster and I don't need full original quality in order to get accurate results (as far as I know and experiment with).

> I’m just curious tho — is there any paying options that can accelerate this kind of process? Just spin up GPU instances?

For now, I found that NVIDIA GPU for example RTX 3060 with 12GB Vram was much faster than my M1 Max. (still working on optimizing for speed and accuracy).

ngai_aku11d ago

What PAYG providers do people here recommend? Most powerful machine at home is an M1 MBA (16GB), so I too am interested in short term options where I can still benefit from the privacy of local models

1 more reply

egorfine11d ago

Yep. Go to vast.ai, spin up a cheap GPU instance, add a bit of code to the project and let it run it finish in just a few hours for like ten bucks.

But it's not as fun as running local model right here on your computer on your own desk. It feels like magic.

asenna12d ago· 2 in thread

Funny this is almost EXACTLY what I did a few days ago on the same machine using very similar techniques and was on the front-page of HN as well:

https://news.ycombinator.com/item?id=48222733 https://blog.simbastack.com/indexed-a-year-of-video-locally/

I wasn't familiar with your project though, interesting stuff.

I'm trying to add more photography related features to Framedex but yeah there's so much we can do locally, exciting times.

iliashadOP12d ago

That's great, I checked your article when it was in front page because someone mentioned my project in the comments.

Good job for the article and the project. That's great, yes local models are getting better and better

pfannl9d ago

Apparently M1 Max video indexing is the new sourdough starter

robrain12d ago· 2 in thread

DaVinci 21 has indexing built-in (AI IntelliSearch). Not to diminish the work you did, but this is now available to many users (probably only Studio users since it has AI in the name)

iliashadOP12d ago

Yes, I didn’t look at it. But does it upload your videos to the cloud or process them locally? And does it allow to provide custom faces data to help labeling faces in your videos ?

I think Adobe premiere pro have it as well but cloud processed

teovall12d ago

The AI features in DaVinci Resolve are all processed locally. It does not currently have face tagging.

2 more replies

insumanth11d ago· 2 in thread

I will be doing these things with local LLMs

Take a fast, small and powerful LLM running locally to index my personal data like images, videos, documents and enrich them and tag with the enriched metadata.

Want to group by people - Search tagged metadata and group it What to search an image by description - tagged metadata What to organize by anything - tagged metadata

This should (hopefully) put an end to my file clutter

nitin_flanker11d ago

I am in no way a tech savy person, don't know coding, don't know networking or AI much either. But I definitely want to have a system like this. An AI powered gallery / video repository that can help me find moments, people, colors, objects from 100s of 1000s of files.

Local LLMs sound so cool but I know they won't be easy to setup or use for common joe like me.

Mashimo11d ago

Immich can do part of this. For photos it does lm object detection and ocr for text. I think for video is currently only the first frame. It also has face / people detection.

And once set up it's easy to use even for non technical people.

asdfasgasdgasdg12d ago· 2 in thread

Cool build but the example videos you provide at the end are . . . not what I would hope for when thinking about the highlights of 2000+ videos of biking? For example the dog barking video only has one scene repeated two or three times and it's five seconds long?

iliashadOP12d ago

Fair enough, what would like to see as an example video and I would make it.

For the dog barking videos, those are only the video scenes that I have a dog barking sound in the video.

I'll keep adding more prompts and example videos, keep an eye for that

asdfasgasdgasdg12d ago

I don't have any preconceptions about specific content I want to see. I'd just think that so many hours of such cool adventures would have greater variety. It made me wonder if your AI really did such a good job of indexing it. It made me think maybe the tech isn't quite ready yet?

Did you ever visit crazyguyonabike.com? A long time ago I had the pleasure of following the journey of a friend of a friend of a friend on that site:

https://www.crazyguyonabike.com/doc/?doc_id=2405

Stuff like that I guess?

rho13812d ago· 2 in thread

This would fit most best as a “Show HN:” post :)

culi12d ago

The title should link to the "full article". I wonder if OP's domain name is banned or something and they're doing this to get around it

iliashadOP12d ago

I tried to edit it and add Show HN, but it doesn't show the edited version. Thank you!

Mawr12d ago· 2 in thread

> Many of the videos I captured amazing moments, and sometimes it's kind of hard to watch the full videos to get those moments.

Yep. I had the same problem.

> Then, run the frame analysis pipeline [...] I have a face recognition plugin using my custom faces data, object detection, on-screen text, shot type, and scene description [...] we will have three vector DB collections that have all the information about our videos, like video location metadata, camera name, faces recognized, objects detected, on-screen text, transcription, description of each scene, and many more [...] we can get better indexed data if you use the advanced mode indexing to use the Qwen2.5-VL-7B-Instruct model to understand and describe your video much better, but at a slower indexing speed

Yeah, uhm... ok :)

If anyone else has a similar problem, the real solution is as follows:

1. When recording, if you witness an interesting moment worth saving later, press the power button — this will mark the current moment in the video as a chapter.

2. Find the chapters later when editing and cut them into clips.

3. You're done :)

This has two main benefits over the insanity above:

1. It's trivially simple instead of insanely complex and inefficient.

2. It will reliably catch all the stuff you find interesting, since you're the one doing the marking.

The downsides:

1. Doesn't work retroactively.

2. It may miss interesting stuff if you miss it at the time as well.

3. Only works for this use case.

4. Nerds won't salivate over your usage of cutting edge tech.

Noumenon7211d ago

What tool has this "press power to mark chapter" feature?

tredre311d ago

The GoPro, it's called HiLight Tag.

m3kw912d ago· 2 in thread

Grab frames, lower res, classify, combine meta data. Write to sql

iliashadOP12d ago

Not really. Grab frames, lower res, classify, combine metadata, transcribe the audio, convert those data (text, visual and audio) to embedding, save them over a vector DB and SQL DB. Which helped me to do semantic search, RAG, search using a screenshot of the video to find the exact the moment in the video plus search using an audio file as well. And other features unlocked with vector DB

ingvay712d ago

Really cool work and workflow. strongly prefer this kind of local, open pipeline that i control over a dependency on Adobe tools and lock ins.

1 more reply

lgats12d ago· 1 in thread

the link https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-vi...

iliashadOP12d ago

Thank you

fl0id12d ago· 1 in thread

it is possible to use apple gpu with containers. either with podman + runkit + recent mesa or with recent vllm-metal from docker https://www.docker.com/blog/docker-model-runner-vllm-metal-m...

iliashadOP12d ago

I was looking for a solution for this issue of running docker containers over MPS and utilizing their GPU power. I think this project will be the solution for it, I’ll try it very soon and add support for it. Thank you, much appreciated

cake-rusk12d ago· 1 in thread

I have an RTX 5090 card but it only has 32 GB RAM, can something like this work on my machine?

iliashadOP12d ago

Yes, and it’ll result in much faster results than the ones that I did with my computer

duncangh11d ago· 1 in thread

Would it be possible to upload them all to YouTube as unlisted and then point Gemini to them? Not sure the limitations wrt unlisted videos. Maybe also could’ve been done with the @google photos operator if uploaded to Google Photos if not a corporate google workspace Gemini instance

iliashadOP11d ago

I'm not sure if it's possible or not, the goal of my project was to utilize local models and your local machine instead of uploading your videos to the cloud. there's couple of cloud services that offer video indexing very well, I could name a few like Tweleve Labs (I'm not affiliated with them but I did a presentation at one of their webinar).

havercosine11d ago· 1 in thread

Well done! I couldn't understand how you are building reels out of it via the agent. Is it some sort of AI tool calling that takes image links and builds a reel via some video editing tool ? Or +/- time delta around the timestamp returned from the indexed from a given query + join them together?

iliashadOP11d ago

Thank you! I'm using RAG, I have every video scene indexed individually in the vector database. When I'm asking the agent, it'll use an Ollama model to understand the request, use the available search tool (searching using transcription text, faces, visual, audio or combined) something like when you use Claude or Chat GPT it'll use the web search tool to find you info online. Then, I can filter out video scenes using the Ollama to better present accurate and unique video scene, then send those video results to Davinci Resolve using their API to create a video timeline using those video clips

crakhamster0111d ago· 1 in thread

Thanks for sharing! I make videos and often have the same problem as well.

Being able to semantic search over your library is useful, but does it solve the review problem? I feel like you would still need to watch the footage back before you know what you're working with.

iliashadOP11d ago

You're welcome. As of now, the project narrow down your options instead of watching 3-4 videos (5 min each, as an example), you can watch 3 clips from them instead (1s to 2.5s each). It gives you only the video moments that you need to watch for review instead of watching the full videos.

WhitneyLand12d ago· 1 in thread

I’d like to see embedding of actual video clips become practical in this type of workflow.

Frame level embedding it covering a lot, but can miss out on a lot of action related searches.

iliashadOP12d ago

Sure, I'm using (https://huggingface.co/collections/Qwen/qwen25-vl) which can help me understand action like falling down, because I can provide for example 5 frames (down scaled to 720p) to understand what is happening in this part of the video

PreownedPlaid12d ago· 1 in thread

this is really cool. was looking to do something similar on mbp 64gb

iliashadOP12d ago

That's really great, thank you!

nyxtom12d ago· 1 in thread

Now this ^^ is an awesome use case!

iliashadOP12d ago

Thank you, would like to know your use case for this kind of project and which prompt you want to genearte ?

LeonardoTolstoy11d ago

What models did you use for the stages? I see Qwen2.5-VL-7B-Instruct mentioned as an advanced option, so I assume maybe Qwen2.5-VL-3B-Instruct by default (which is what I also use for a lot of stuff, it is incredibly good at "clean" OCR, but as you maybe indicate not the best at "describing a scene").

EDITED: I didn't realize Whisper was a local model. I never tried transcription before, so I had always figured it was a pay model by OpenAI. I'll have to check it out (although the runtime listed here is a bit daunting).

For that project I'll say I don't see much degradation in embedding quality at much much worse quality than 720p (all the way down to 240p), which speeds things up considerably. Although I don't really do face or object detection, just scene embeddings. To me any process whereby it would take longer to process the video than watch it is probably a no go in general. Obviously a challenge for local-first analysis.

zzsshh11d ago

Related article on indexing videos but with a local text description and using Gemma4: https://blog.simbastack.com/indexed-a-year-of-video-locally/

tj-teej11d ago

I've been screenshotting twitter since 2016 when Brexit happened and with the goal of one day putting together some kind of art piece.

The world and our discourse around it has changed so much over the past ten years and now with this kind of technology I'm so excited to be able to classify these images from my iCloud and start on the project.

wferrell12d ago

https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-vi...

____tom____11d ago

I wonder how long it would take on faster hardware. I have ten times that much footage, but 67 * 10 hours is a lot of processing.

I might be better off getting something with a beefy GPU on AWS or Google cloud.

iliashadOP10d ago

Thank you so much for your support over Hacker News, I wanna share with people of HN. a special discount code "HACKERNEWS", use it here https://shop.edit-mind.com/checkout/buy/9f18a6f0-b437-47ec-b... and get 10% OFF only for the first 5 people and it'll expires

iliashadOP12d ago

I would love your feedback and suggestions for new improvements or features you wanna have, either in the source available version, the desktop app or blog post itself?

synergy2012d ago

can vlm be used instead or it's too heavy and slow

j / k navigate · click thread line to collapse

115 comments

94 comments · 30 top-level

Beijinger12d ago· 14 in thread

Does it work for porn collections too?

pduggishetti12d ago

You'll need a lora for this, porn content rejection is heavy. Or you'll need a abliterated model, not sure if vision also works.

You might want to add something like yolo finetune to detect scenes + face recognition too.

dotancohen12d ago

For GP's purpose, can face recognition techniques be repurposed for, um, other body parts recognition? Sometimes the actresses are facing away from camera. There are exposed lips, if that helps.

1 more reply

vorticalbox12d ago

Vision still works perfectly fine in abliterated models.

2 more replies

lifestyleguru12d ago

Last time I tried whisper, it hallucinated an elaborate conversation from sounds of slapping and moaning and it took minutes to spit every single line of it.

3eb7988a166312d ago

Parakeet has been trained to detect non-voice sounds and exclude that from identification, so you might have better luck with that family.

dotancohen12d ago

If I remember correctly, the whisper documentation actually recommends to trim non-speech portions as the models halucinate heavily during those portions.

sarjann12d ago

Asking the important questions

fhdkweig12d ago

The internet is for porn. https://www.youtube.com/watch?v=LTJvdGcb7Fs

nntwozz11d ago

I was meandering through the comments about to leave the topic when my interest suddenly piqued upon reading the word porn.

iliashadOP12d ago

Why it’s always the same question? Hahah. I posted my project over Reddit and I got the same one hahah

fennecfoxy11d ago

Ha ha ha, it's because most humans overlap on a few things - like eating, shitting, sleeping and fucking, ha ha ha.

supertroop12d ago

Not sure if you’re being sarcastic but I think this is an interesting question. Would deep seek be useful here since it is local?

fibers12d ago

just because it is local does not mean it wouldn't reject explicit content. you can definitely try and find abilated models and can attempt to use unsloth or something similar to tune it properly.

1 more reply

okr12d ago

Depends how deep you wanna go.

justinram1112d ago· 9 in thread

Something I've enjoyed more than I expected is Google and Apple photos sending me photo memories and compilations of various things in my life and my kids lives over the last decade.

I'm really bullish on taking more video of my kids, with the thought that it will become easier and easier for AI to put them into little compilations I can enjoy later.

iliashadOP12d ago

mwelpa11d ago

I wish I could connect Apple photos to my Spotify account and have photo memories connected with songs I listened to at the time :)

alias_neo11d ago

Music memories are the best.

theshrike7910d ago

I think the Apple stuff is done 100% on device.

Google loves scanning stuff on in the cloud though.

goodmythical12d ago

You don't mind Google using your kids to train their models and advertising algorithms?

Years from now they'll be getting "hey look at BIKE BRANDS' NEWEST CHEAP BIKE REMEMBER WHEN YOU USED TO RIDE BIKE BRAND BIKES"

satvikpendem11d ago

I think most people really don't care, and/or will just adblock those sorts of things when they do arrive.

2 more replies

JMiao12d ago

do you use android and ios, or is there another benefit to having personal media with both?

dave808811d ago

I run both on my phone as a lazy (but flawed) backup strategy.

iliashadOP12d ago

Can you please elaborate more?

1 more reply

WarOnPrivacy12d ago· 6 in thread

I was surprised to learn that the

    M1 Max CPU is an ARM/SoC, comparable to an 11th gen Intel i9

Do I have it right? Would Windows ARM performance be similar for those cpu?

ref: https://www.cpubenchmark.net/compare/4585vs4245/Apple-M1-Max...

pachouli-please12d ago

It's also a bit apples (heh) to oranges for a handful of reasons, but most impactful

- "unified" ram makes all the system ram available as VRAM - dedicated ai coaccelerator thingy

Both of these reasons allow the apple silicon chips to crush conventional cpus in these kind of AI model workload stuffs

No idea about what the windows arm stuff is capable of. I know they use Qualcomm snapdragon chips though.

owldown12d ago

“Comparable” is maybe true if we are talking about single core performance, but for memory bandwidth, the M1 Max is about 8 times faster. Wider bus, lower latency, not even close.

voidmain000112d ago

No comparison. M1 Max has 400GB/s RAM bandwidth while Snapdragon X2 Elite, the latest and greatest , has 228GB/s RAM bandwidth.

Rohansi11d ago

voidmain000111d ago

1 more reply

iliashadOP12d ago

To your question, I can’t deny or confirm that because I didn’t tried it this project over a Windows machine yet or a machine with this config

tontonius12d ago· 5 in thread

if anyone is interested in searching large video collections local and offline I suggest taking a look at Jumper https://docs.getjumper.io

comes with some nifty features like NLE- integrations, people search, MCP, API etc

Disclaimer: one of the co-founders

dotancohen12d ago

The link just timed out for me. I'm in Israel, connecting via residential WiFi. All other sites that I regularly use connect just fine.

tontonius11d ago

hmm weird works for me.. what about https://getjumper.io/?

dotancohen11d ago

They're both working now.

____tom____11d ago

Your docs say you integrate with Davinci resolve.

Other comments mention davinci resolve has this built in. How would you compare the two?

MaxGL7d ago

Jumper is significantly more accurate

GreenSalem11d ago· 4 in thread

A lawyer I know who specialises in rape, and is excellent at getting the obviously guilty exonerated, lost a case last year because of GoPro videos.

Her client was recording while committing the abhorrent crime. The criminal would otherwise have got off.

From my perspective, the GoPro camera produced a good outcome. Still, one has wonder why anyone to record their criminal actions.

Yiin11d ago

word "her" in this context gave me heavy feelings, what makes one to pick such a career move...

GreenSalem11d ago

Beggars cant be choosers.

She would rather have done corporate law but did not have the academic credentials or the networks needed for a job at the likes of Latham Watkins or White and Case.

Still it is good for society that criminals get the worst lawyers to defend them.

fennecfoxy11d ago

Why? You're being sexist and I hope you can understand why.

djmips11d ago

esjeon12d ago· 3 in thread

> Then, run the frame analysis pipeline, which will divide the video into separate video scenes (1s each, or 1fps) > (…) > Frames analyzed 57,537

Aha, it makes total sense. This number sounds much more reasonable than “669 GB”, since the actual total size of processed frames would be like 10-30 GB.

(Not downplaying anything. Doing-at-home always requires some math on practicality)

> Total compute time 67h 40m 42s

I’m just curious tho — is there any paying options that can accelerate this kind of process? Just spin up GPU instances?

iliashadOP12d ago

> Aha, it makes total sense. This number sounds much more reasonable than “669 GB”, since the actual total size of processed frames would be like 10-30 GB.

> I’m just curious tho — is there any paying options that can accelerate this kind of process? Just spin up GPU instances?

For now, I found that NVIDIA GPU for example RTX 3060 with 12GB Vram was much faster than my M1 Max. (still working on optimizing for speed and accuracy).

ngai_aku11d ago

1 more reply

egorfine11d ago

Yep. Go to vast.ai, spin up a cheap GPU instance, add a bit of code to the project and let it run it finish in just a few hours for like ten bucks.

But it's not as fun as running local model right here on your computer on your own desk. It feels like magic.

asenna12d ago· 2 in thread

Funny this is almost EXACTLY what I did a few days ago on the same machine using very similar techniques and was on the front-page of HN as well:

https://news.ycombinator.com/item?id=48222733 https://blog.simbastack.com/indexed-a-year-of-video-locally/

I wasn't familiar with your project though, interesting stuff.

I'm trying to add more photography related features to Framedex but yeah there's so much we can do locally, exciting times.

iliashadOP12d ago

That's great, I checked your article when it was in front page because someone mentioned my project in the comments.

Good job for the article and the project. That's great, yes local models are getting better and better

pfannl9d ago

Apparently M1 Max video indexing is the new sourdough starter

robrain12d ago· 2 in thread

DaVinci 21 has indexing built-in (AI IntelliSearch). Not to diminish the work you did, but this is now available to many users (probably only Studio users since it has AI in the name)

iliashadOP12d ago

Yes, I didn’t look at it. But does it upload your videos to the cloud or process them locally? And does it allow to provide custom faces data to help labeling faces in your videos ?

I think Adobe premiere pro have it as well but cloud processed

teovall12d ago

The AI features in DaVinci Resolve are all processed locally. It does not currently have face tagging.

2 more replies

insumanth11d ago· 2 in thread

I will be doing these things with local LLMs

Take a fast, small and powerful LLM running locally to index my personal data like images, videos, documents and enrich them and tag with the enriched metadata.

Want to group by people - Search tagged metadata and group it What to search an image by description - tagged metadata What to organize by anything - tagged metadata

This should (hopefully) put an end to my file clutter

nitin_flanker11d ago

Local LLMs sound so cool but I know they won't be easy to setup or use for common joe like me.

Mashimo11d ago

Immich can do part of this. For photos it does lm object detection and ocr for text. I think for video is currently only the first frame. It also has face / people detection.

And once set up it's easy to use even for non technical people.

asdfasgasdgasdg12d ago· 2 in thread

iliashadOP12d ago

Fair enough, what would like to see as an example video and I would make it.

For the dog barking videos, those are only the video scenes that I have a dog barking sound in the video.

I'll keep adding more prompts and example videos, keep an eye for that

asdfasgasdgasdg12d ago

Did you ever visit crazyguyonabike.com? A long time ago I had the pleasure of following the journey of a friend of a friend of a friend on that site:

https://www.crazyguyonabike.com/doc/?doc_id=2405

Stuff like that I guess?

rho13812d ago· 2 in thread

This would fit most best as a “Show HN:” post :)

culi12d ago

The title should link to the "full article". I wonder if OP's domain name is banned or something and they're doing this to get around it

iliashadOP12d ago

I tried to edit it and add Show HN, but it doesn't show the edited version. Thank you!

Mawr12d ago· 2 in thread

> Many of the videos I captured amazing moments, and sometimes it's kind of hard to watch the full videos to get those moments.

Yep. I had the same problem.

Yeah, uhm... ok :)

If anyone else has a similar problem, the real solution is as follows:

1. When recording, if you witness an interesting moment worth saving later, press the power button — this will mark the current moment in the video as a chapter.

2. Find the chapters later when editing and cut them into clips.

3. You're done :)

This has two main benefits over the insanity above:

1. It's trivially simple instead of insanely complex and inefficient.

2. It will reliably catch all the stuff you find interesting, since you're the one doing the marking.

The downsides:

1. Doesn't work retroactively.

2. It may miss interesting stuff if you miss it at the time as well.

3. Only works for this use case.

4. Nerds won't salivate over your usage of cutting edge tech.

Noumenon7211d ago

What tool has this "press power to mark chapter" feature?

tredre311d ago

The GoPro, it's called HiLight Tag.

m3kw912d ago· 2 in thread

Grab frames, lower res, classify, combine meta data. Write to sql

iliashadOP12d ago

ingvay712d ago

Really cool work and workflow. strongly prefer this kind of local, open pipeline that i control over a dependency on Adobe tools and lock ins.

1 more reply

lgats12d ago· 1 in thread

the link https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-vi...

iliashadOP12d ago

Thank you

fl0id12d ago· 1 in thread

it is possible to use apple gpu with containers. either with podman + runkit + recent mesa or with recent vllm-metal from docker https://www.docker.com/blog/docker-model-runner-vllm-metal-m...

iliashadOP12d ago

cake-rusk12d ago· 1 in thread

I have an RTX 5090 card but it only has 32 GB RAM, can something like this work on my machine?

iliashadOP12d ago

Yes, and it’ll result in much faster results than the ones that I did with my computer

duncangh11d ago· 1 in thread

iliashadOP11d ago

havercosine11d ago· 1 in thread

iliashadOP11d ago

crakhamster0111d ago· 1 in thread

Thanks for sharing! I make videos and often have the same problem as well.

Being able to semantic search over your library is useful, but does it solve the review problem? I feel like you would still need to watch the footage back before you know what you're working with.

iliashadOP11d ago

WhitneyLand12d ago· 1 in thread

I’d like to see embedding of actual video clips become practical in this type of workflow.

Frame level embedding it covering a lot, but can miss out on a lot of action related searches.

iliashadOP12d ago

PreownedPlaid12d ago· 1 in thread

this is really cool. was looking to do something similar on mbp 64gb

iliashadOP12d ago

That's really great, thank you!

nyxtom12d ago· 1 in thread

Now this ^^ is an awesome use case!

iliashadOP12d ago

Thank you, would like to know your use case for this kind of project and which prompt you want to genearte ?

LeonardoTolstoy11d ago

zzsshh11d ago

Related article on indexing videos but with a local text description and using Gemma4: https://blog.simbastack.com/indexed-a-year-of-video-locally/

tj-teej11d ago

I've been screenshotting twitter since 2016 when Brexit happened and with the goal of one day putting together some kind of art piece.

wferrell12d ago

https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-vi...

____tom____11d ago

I wonder how long it would take on faster hardware. I have ten times that much footage, but 67 * 10 hours is a lot of processing.

I might be better off getting something with a beefy GPU on AWS or Google cloud.

iliashadOP10d ago

iliashadOP12d ago

I would love your feedback and suggestions for new improvements or features you wanna have, either in the source available version, the desktop app or blog post itself?

synergy2012d ago

can vlm be used instead or it's too heavy and slow

j / k navigate · click thread line to collapse