Nice job. I have a similar automated CRON that runs overnight and does the following when it encounters new pictures in a folder:
- Qwen3-VL 8b creates a verbose description + keywords
- Simpler CLIP encoder builds another set of tags
- Description is placed into an image RAG
- Image has keywords placed using "underscores" into the file name itself
- Description/Tags/Keywords are all embedded in EXIF data on the image
I've got close to around 30k worth of images so doing this gives me a more manifold means of searching using natural language, keywords, etc. to quickly retrieve images.