This is an excellent thing to do. Especially that LLMs excel at batching thus you can index multiple photos and videos in parallel for no performance penalty.
You know what I REALLY want? Just point this beast at the folders and it tell me which 150 shots are good to process from these 1,500 images. That's the dream!
Although the technology is getting there, it's still a very difficult problem to solve. Taste and art is subjective. Also me as a photographer will always be concerned - "what if my best shot was in one of these rejected shots".
But yeah, I think I'll try to do some more of these experiments soon.
—-
“Models scored all 4,487 photos. NIMA rewards technical craft (sharpness, composition), LAION rewards emotional/aesthetic appeal, MUSIQ is more general quality. Combined: 0.4 NIMA + 0.3 LAION + 0.3 MUSIQ, deduped at 0.85 CLIP similarity.
Interesting: the models wildly disagreed on some shots — one photo ranked NIMA #2 globally but LAION #4313.”
Bear in mind that ttft on MLX is much much faster on M5 Pro as compared to M4 Pro.
Also bear in mind that those figures are with NO optimizations whatsoever: no MCP, no DFlash. I am waiting for both to be released for the Qwen models.
For Qwen 35B enabling native MCP on MLX models slows it down by 10%.
For Qwen 27B enabling native MCP on MLX models speeds token generation up almost exactly 1.5x.
(all tested on M5 pro).
When your Claude wrote this post they might not have selected the right URL to share, unless your home folder is exposed. Care to share the skill files?
1: To get an Android app working that has been delisted and requires a 'key' app that you purchase. We did purchase it, but didn't think to make any backups.
PS - I just put this together in the last few mins, removed my personal files and references. So it's not tested properly, please let me know if any issues.
It's still an early hack, but I have thousands of still images as well from my camera which I've not processed and I need to do the same analysis for those.
So I'll continue working on it, but happy to receive any PRs if anyone finds any use for it.
I'm tired of having a backlog of thousands of images and videos, leaving it for later.
The idea of capable local models could be a huge unlock here if they are able to do the bottom-up context collection research / tagging / etc. at scale.
So if you give it a bunch of screenshots it will try and intelligently name them based upon what is in the screenshot. Same for videos, PDFs, etc.
But to your point I haven't even tried charging money as it feels like something Apple is just going to bake in as a feature.
Are you planning to open source it? Or maintain it in the future?
But I can tell it's only a matter of time before agents become smart enough to let my non-tech friends be able to just say "Make sense of all these videos in my folder" and it just does it.
Using API to analyze even a subset of this would've been painful imo.
There are other options that are good too. Gemini 3.1 Flash Lite is great for this kind of thing (NOT Gemini 3.5 Flash though - the pricing for that is bad).
But yeah, how it picks the frame is the weak-point here. Scene detection would definitely help - this is #1 on the Roadmap.
Could you share how your scene-detection picks the frames?
---
For the vector search, I went for the trade-off of not having it but keeping it simple with plain Markdown files for more portability. The knowledge travels with the files when an SSD moves, no index to keep in sync, and plain text that outlives the tool. But the other path you mentioned is interesting as well to explore.
You could also just use FFmpeg as it can do scene detection too. I tested both but liked the results from the histogram analyzer more.
Yeah, markdown works well if you're going to search through it with Claude Code or something like that. I built ClipScape as an Electron app with a local SQLite database, as I wanted an interface I could search and chat in and see the relevant thumbnails.
Not gonna lie, llama.cpp had the fans spinning at max speed. But it worked and I got the job done.
This always confuses me - don't people want their computations to run as fast as possible and thus inevitably produce more heat that needs to be vented?
I suppose sometimes it is just an analogy for "its utilizing 100% of my resources" (which I'm guessing it is here), but I've definitely had people say it as an actual complaint in different contexts
I think fan loudness is an outgrowth of conspicuous consumption because a certain OEM decided to make it a marketing bullet-point.
I was equally disappointed by by people - especially device reviewers - banging on the drum that phones made of plastic "didn't feel premium", and we got phones with glass backs that have to be shoved into plastic cases (because plastic is the near-perfect material to protect fragile phones screens and innards)
I was vaguely aware of all these pieces existing (except for running a facial recognition database at home o_o), but it's really neat to put them all together like that.
Still blows my mind I can do all this from my 2021 MBP.
I'll try to do a post once I have the next steps working (helping with planning and editing videos with Davinci Resolve).
Great job. Long live the M1 Max!
You can make AI-generated content without it being slop. Slop, to me at least, is content that's wrong, padded, or generic.
I see the cadence / short-sentence issues but if there's something else beyond those, I'd actually want to know what made it feel bad.
I would've put off documenting what I did over the weekend but instead, I did document everything, spent quite some time (several iterations) and effort to make sure it does not hallucinate and writes in my own tone and voice. I'm sure it could be better but the content is not made-up.
At a time where most of us software engineers have changed our workflows to let AI write 80+% of our code using agents, I feel writing is heading the same way. It then becomes a matter of taste, whether it's done well or not.
If you're looking clues and signs for whether a content has used AI, you're going to be disappointed over the next 12 months.
If it feels jarring right now, I'll work harder on the workflow so it feels more natural next time (someone shared this project with me - https://github.com/blader/humanizer).
But this clearly allows me to make content which I wouldn't have done earlier.
But because of the fear of non-perfection, I used to put away things like creating this article or even posting it anywhere. And I do think the article has real value that HN would appreciate (I am myself an HN-enthusiast).
I'll try more. Someone else shared this project which would be really helpful - https://github.com/blader/humanizer
Also a side note, the blog is posted on my self-created Slopit.io platform which is purely meant for your personal agents (working along with you) to post content - I recommend trying it out. https://blog.slopit.io/this-blog-post-is-slop/
I know, things are getting difficult with all the slop around, but my personal opinion is, as the agents get better at writing, the "annoying-ness" factor reduces and pieces of substance will still be appreciated, even if it was written by agents. This and the fact that agents aren't going away.
If I've automated a lot of my coding, I feel like engineers like me would naturally progress to also taking agents' help to write useful content.
PS - this comment was 100% hand-typed.
The tells were unmistakable but it still had a human touch, so I for one am glad you published anyway.
We ban accounts that do this and I don't want to ban you, so please write everything that you post to HN by hand.
Of course, it's impossible to know for sure what was LLM processed or not, but we're getting complaints about some of your posts and, upon inspection, the complaints seem justified.
The activity monitor does show all kinds of Electron apps active, on top of a presumably model-loaded Handy and a virtual machine for Claude Code, so I guess that's the real root cause for all the swapping. If your laptop starts trashing I can't imagine you have any use for those apps, which will grind to a halt.
[1] https://huggingface.co/mlx-community/gemma-4-31b-it-4bit
Although slightly laggy, I was impressed by the fact that I was still able to work on other things and have a bunch of tabs open on my Brave browser.
1. What is the search index?
2. The "description.md" example has things like "faces -> cluster_id". Is this from Davinci Resolve's face index? Things like faces+names and locations are really important with photo collections, but general LLMs don't handle them so well.
Something which I can query later - Like when brainstorming with Claude "I wanna make some videos of the Luxury rooms in the lodge" and it knows what all videos could help here (going through the files).
There's also a folder root level files that aggregates the text descriptions to make it easier to find.
I've just attached an image in the blog showing an example - https://blog.simbastack.com/_media/gvcycx2n.png
2) No - nothing from DaVinci Resolve. Framedex is a standalone pipeline. Resolve isn't involved.
Faces come from insightface (the open-source buffalo_l pack - RetinaFace for detection), running locally on CPU. For each clip it detects faces in the sampled frames, embeds them, and writes rows to ~/.framedex/faces.db.
Tbh, this part I know it's building up in my local DB but I haven't tested how good is it. Will check them out properly soon.
But yeah, on your broader point that's why framedex deliberately does not ask the LLM to handle faces or locations.
----
Faces → insightface / ArcFace embeddings. Deterministic, comparable across clips. The vision model only contributes a rough people_count; it never tries to identify anyone.
Locations → EXIF GPS via exiftool, reverse-geocoded through Nominatim/OpenStreetMap. Hard metadata, not a guess.
The LLM only does what it's good at: scene description, mood, shot type, keywords, keep/review/cull rating (this last part is also debatable though).
I am pretty sure that the vast majority of Airbnb hosts would not agree with you.
> equals TripAdvisor crucifixion
I have no idea how the Airbnb hosts with fake listings survive, really.
But on the other hand, genuine videos do take time and slows down the process.
>“I bought it for Chrome. It's running a model that didn't exist when I bought it.”
Well duh, personal computers run new software. That’s literally the whole point. The Apple II didn’t sell on the strength of the preinstalled apps.
But I've mentioned elsewhere - if it wasn't for all the AI-assistance, I would've put-off documenting everything that I did and not even get to the writing part.
But yeah, I'll be working on the workflow to make the next write-up better, more humanized.
(Also email is in my profile).
It's not tested properly after I genericized it. Will try to go through it properly and add more updates.
Two big things on my TODO: 1) Make use of this indexing and using Claude's help, make video editing faster with Davinci Resolve (now that I have a good index of all the content)
2) I currently did this for videos, but I want to add more things to this for my thousands of still images of my camera - need to make sense of them. So I'll be working on this as well.
I can sell this as a service to people who can't even run an LLM, or don't want to cook their hardware.
Waitlist open:
"Catalog, search, preview, and generate production-ready prompts & scripts from your entire archive — on your existing hardware. Then render in the cloud."
-31b It's a dense model
-how many tokens/s is it running at
-What temps are the M1 max GPU/CPU running at
-Is it mlx or gguf
-Why 31b and not 26b which is moe and much more efficient on the m1 max at 50tokens/s & low temps.
I personally use (MLX) qwen3.6-35b-8bit mostly, but use Gemma-4-26b-4bit for image analysis, its mind blowing how fast it is at identifying the scene in a photograph.
Shameless plug: I'm the founder of Chat Octopus, an AI media assistant, and it actually 'looks' at the videos to understand them before creating a cut.
I have a Claude max sub and plenty of OpenRouter credit, but I don’t feel good about uploading my family’s private videos