Show HN: I spent $450 on GCP's Video API, so I built a local alternative

4 pointsiliashad6mo ago0 comments

Hey!

I have 2TB+ of personal video footage. Finding specific moments was impossible, like searching PDFs by content, but for video. Google's Video Intelligence API worked, but it cost me $450+ for just a few videos, and I'd have to upload all my personal footage to their cloud.

So I built Edit Mind to do it locally.

The core problem:

You can search PDFs by their content in Finder (Mac OS). Why can't we do the same with videos? Every cloud solution either costs a fortune at scale or requires uploading your personal footage to cloud storage but I don’t wanna have my personal raw videos uploaded to the cloud.

How it works:

- Everything runs on your machine (your raw videos never leave your computer) - Indexes videos once: transcribes audio, detects objects (YOLO), recognizes faces, analyzes emotions - Stores metadata in ChromaDB (local vector database) - Natural language gets parsed into structured queries, then semantic search finds matches (uses Gemini API currently, but can swap in a local LLM like Ollama if you prefer)

What it actually does:

1. Type: "scenes where @Ilias is looking happy, eating a pizza" 2. Behind the scenes it converts this to: {"faces":["Ilias"], "emotions": ["happy"], "objects": ["pizza"]} 3. Then searches your local vector database for matching scenes across 2TB of footage in seconds.

Real cost comparison:

* GCP Video Intelligence API: ~$0.10/minute of video (https://cloud.google.com/video-intelligence/pricing) = $100+ for 200 (5 minutes) videos (I have over 3000 videos), this will cost me over 1500+$ for getting the videos analyzed * Edit Mind: Free after initial setup, runs on your own hardware (you need to have good one to handle this heavy video processing tasks)

Technical choices:

Built with Electron because I needed real filesystem access and didn't want browser storage limits. Python backend handles the heavy ML work (face_recognition, YOLOv8, FER for emotions) and communicate via web sockets. The analysis pipeline is plugin-based – took me less than a day to add color dominate as a separate plugin.

Current limitations:

* Needs decent hardware (GPU recommended but not required) * Face recognition requires some manual training (adding known faces) * UI is functional but could be prettier * Query parsing uses Gemini API by default (but you can configure it to use local alternatives)

Why I'm sharing this:

I can't be the only person with this problem. Videographers, parents with years of family footage, documentary filmmakers, anyone with large video libraries. The code is MIT licensed and designed to be extended.

Would love feedback on:

1. Is $450+ for cloud video analysis a common pain point, or am I an outlier? 2. What other analyzers would be useful? (thinking about camera movement analysis and scene type classification like POV/vlog/interview) 3. Should I prioritize making it 100% offline by default, or is the Gemini API option fine for most use cases?

GitHub: https://github.com/iliashad/edit-mind

Demo: https://youtu.be/Ky9v85Mk6aY

Built this over a few weekends because I was tired of paying Google to search my own videos. Happy to discuss architecture decisions or the ML pipeline!

0 comments

No comments yet.