I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models
Full article: https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-videos-using-my-m1-max-computer
Full article: https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-videos-using-my-m1-max-computer
Here's what they actually said:
- The Senior Software Engineer said: "LLMs are babies. If you don't understand the architecture behind everything, you won't be able to follow."
- The Google advocate pushed back slightly: "Writing code has become a commodity, like car manufacturing after automation. The question isn't whether to learn to code, it's why you want to."
- The IBM infrastructure engineer had the most actionable take: "Don't treat AI as a ghostwriter. Never commit code you can't explain. Use it as a tutor, not a replacement for your own thinking."
And many more
We also reacted to a clip of a Silicon Valley exec telling university graduates that "AI is the next industrial revolution" and getting booed by the crowd.
One of the more sobering parts: at current usage levels, Anthropic is likely losing money on heavy Claude subscribers. No inference company is profitable today. The Google advocate, who works on LLM inference at scale, put it directly: if we don't get significant inference efficiency improvements in the next five years, this entire ecosystem becomes unaffordable. And the NIVIDA have more details to talk about AI cost effectiveness
I decided to build my own tool that needs to have 3 important things: can transcribe videos, analyse video frames, and everything needs to be done locally.
I don't wanna deal with storing my videos in the cloud because of two concerns: privacy and storage cost.
I've been working for the last couple of months. I have a source available version that can be used for free (personal and commercial use with companies that have fewer than 5 people). Available here (https://github.com/IliasHad/edit-mind), and the project has 1.3k Github stars
Now, I'm building a desktop app with direct NLE integration (Final Cut Pro, DaVinci Resolve, and Adobe Premiere Pro). This includes an editing agent that understands your footage and your editing style. (https://edit-mind.com)
Demo Video: https://youtu.be/jcctyfVg_34
I tried the Google Video Intelligence API. Got a $400 bill for 4 videos (5 minutes average, 4k videos) of analysis (doesn't include video transcription), and I used my GCP startup credits to cover the bill.
I decided to build my own tool that needs to have 3 important things: can transcribe videos, analyse video frames, and everything needs to be done locally.
I don't wanna deal with storing my videos in the cloud because of two concerns: privacy and storage cost.
I've been working for the last couple of months. I have a source available version that can be used for free (personal and commercial use with companies that have fewer than 5 people). Available here (https://github.com/IliasHad/edit-mind), and the project has 1.3k Github stars
Now, I'm building a desktop app with direct NLE integration (Final Cut Pro, DaVinci Resolve, and Adobe Premiere Pro). This includes an editing agent that understands your footage and your editing style.
Preview: https://youtu.be/jcctyfVg_34
Happy to answer questions and hear your feedback.
I decided to build Edit Mind, which started as a simple CLI that would transcribe videos using OpenAI Whisper and search across them using text, nothing fancy.
Then I decided to add frame analysis. I built a Python script that would handle that, take the full video, divide it into smaller parts that would be 1 second to 2 seconds long, pass the frame to another system that would recognize faces, objects, text over image, etc.
After that, I decided to build an Electron desktop app that would manage the UI and have a search and chat feature.
Then, I was like let's open source it and share it with the community over Reddit. People loved it (https://www.reddit.com/r/selfhosted/comments/1ogis3j/i_built...). Many of them requested Docker integration. I decided to focus on that instead, which was a great idea and suggestion. (https://www.youtube.com/watch?v=YrVaJ33qmtg&t=12s)
Now, we have 3 Docker containers: one responsible for the web UI, one for the background jobs , media stuff and local vector integration, one for the ML service (transcription and frame analysis) that will be a Python script communicating via WebSocket.
After posting over X and tagging Twelve Labs for inspiring me to add new features and UI enhancements to the project, I had the opportunity to present the project live at their webinar series (if you wanna check Edit Mind in a live demo: https://www.youtube.com/watch?v=k_aesDa3sFw&t=1271s)
After getting the proof of concept with all the features that I was hoping to get from Edit Mind, it's time to focus on improving the code quality, refactoring, and implementing best practices. Remember that I'm working solo with the help of external contributors.
I would love to get your feedback about the project.