undefined | Better HN

0 pointsgwern4mo ago0 comments

> My siblings are very much not developers. That's a lot of data for them to download, store, and figure out a way to view.

I wasn't suggesting anything about your siblings, but you, who are a developer. I was just talking about the actual download step, not what you did after that. (Obviously you were going to host them somewhere else in some other form. Probably not DVDs but a little quickie website or maybe just a Flash drive with a HTML file index, say, I don't know, lots of options here to make it user-friendly for your siblings on Christmas Day. The hard drive or Flash drive idea has the benefit of LOCKSS, especially if you use up the spare space providing PAR2 FEC.)

> I doubt any model could effectively label locations and people over 20 years of video.

Actually, Gemini is highly promptable with a large context window and a single still image only takes up ~300 tokens IIRC, so I think that you could probably do so! Just include, say, 3 photos of each person over time with a natural language description, and 1 photo of each location, and that might be enough to get back useful labels. Gemini can even do bounding boxes. (Google is quite proud of its vision and video analysis capabilities.) And you can run multiple passes or split up videos etc.

0 comments

2 comments · 1 top-level

SamPatt4mo ago· 1 in thread

Ah I understand you now. Yes I could have had a service do the digitizing then only done delivery myself. And given the time investment that probably would have been more sound. I don't think I'd do it all myself if I did it again.

I didn't know Gemini models were that capable. I admit I'm still skeptical about this approach though - even if it were capable of accurately labeling people and locations across decades, there's no way it could know when a scene is of personal interest. I kept a running log for each sibling as I was manually doing the labeling, knowing what they'd want to see, which presumably is only possible for me and my siblings to do with any accuracy.

If AI could ever do that then we've definitely hit ASI!

gwernOP4mo ago

> I kept a running log for each sibling as I was manually doing the labeling, knowing what they'd want to see, which presumably is only possible for me and my siblings to do with any accuracy.

But you could feed that back in! Just write it down. It's all tokens. As you read over descriptions and note down key pieces of family history or per-sibling details, that provides information about better annotating the next video for possible points of interest. And you can chat with the LLM and write down more general principles. It's not like a LLM like Gemini doesn't know an enormous amount about family life and things of sentimental value, and can't make good initial guesses. And when you do this, you still haven't used up more than a small fraction of the context window with these image references and text profiles and principles...

j / k navigate · click thread line to collapse