undefined | Better HN

0 pointsEisenstein1y ago0 comments

I'm curious why you find descriptions of images useful for searching. I developed a similar flow and ended up embedding keywords into the image metadata instead. It makes them easily searchable and not tied to any databases, and it is faster (dealing with tens of thousands of images personally).

* https://github.com/jabberjabberjabber/LLavaImageTagger

0 comments

4 comments · 2 top-level

vunderba1y ago· 1 in thread

I can't speak to the OPs decision, but I also have a similar script set up that adds a combination of YOLO, bakllava, tesseract etc. and also puts it along with a URI reference to the image file into a database.

I actually store the data in the EXIF as well, but the nice thing about having a database is that it's significantly faster than attempting to search hundreds of thousands of images across a nested file structure, particularly since I store a great deal of media on a NAS.

EisensteinOP1y ago

You wouldn't happen to have this on github or have some other way to share it? I am interested in seeing how you implemented it.

tracerbulletx1y ago· 1 in thread

It's not as good as tags but it does pretty ok for now especially since searching for specific text in an image is something I want to do a lot. I'm trying to work on getting llama to output according to a user defined tagging vocabulary/taxonomy and ideally learn from manual classifications. Kind of a work in progress there.

This is the prompt I've been using.

"Create a structured list of all of the people and things in the image and their main properties. Include a section transcribing any text. Include a section describing if the image is a photo, comic, art, or screenshot. Do not try to interpret, infer, or give subjective opinions. Only give direct, literal, objective descriptions of what you see."

EisensteinOP1y ago

> I'm trying to work on getting llama to output according to a user defined tagging vocabulary/taxonomy and ideally learn from manual classifications. Kind of a work in progress there.

Good luck with that. The only thing that I found that works is using gbnf to force it, which slows inference down considerably.

j / k navigate · click thread line to collapse