undefined | Better HN

story

0 pointswmf1y ago0 comments

I don't think the LoRAs are fine-tuned locally at all. It sounds like they use RAG to access data.

0 comments

Consider a feature from earlier in the keynote: the thing Notes (and Math Notes) does now where it fixes up your handwriting into a facsimile of your handwriting, with the resulting letters then acting semantically as text (snapping to a baseline grid; being reflowable; being interpretable as math equations) but still having the kind of long-distance context-dependent variations that can't be accomplished by just generating a "handwriting font" with glyph variations selected by ligature.

They didn't say that this is an "AI thing", but I can't honestly see how else you'd do it other than by fine-tuning a vision model on the user's own handwriting.

Hugsun1y ago

I didn't see the presentation but judging by your description, this is achievable using in-context learning.

wmfOP1y ago

For everything other than handwriting I don't think the LoRAs are fine-tuned locally.

derefr1y ago

Well, here's another one: they promised that your local (non-iCloud) photos don't leave the device. Yet they will now — among many other things they mentioned doing with your photos — allow you to generate "Memoji" that look like the people in your photos. Which includes the non-iCloud photos.

I can't picture any way to use a RAG to do that.

I can picture a way to do that that doesn't involve any model fine-tuning, but it'd be pretty ridiculous, and the results would probably not be very good either. (Load a static image2text LoRA tuned to describe the subjects of photos; run that once over each photo as it's imported/taken, and save the resulting descriptions. Later, whenever a photo is classified as a particular subject, load up a static LLM fine-tune that summarizes down all the descriptions of photos classified as subject X so far, into a single description of the platonic ideal of subject X's appearance. Finally, when asked for a "memoji", load up a static "memoji" diffusion LoRA, and prompt it with the that subject-platonic-appearance description.)

But really, isn't it easier to just fine-tune a regular diffusion base-model — one that's been pre-trained on photos of people — by feeding it your photos and their corresponding metadata (incl. the names of subjects in each photo); and then load up that LoRA and the (static) memoji-style LoRA, and prompt the model with those same people's names plus the "memoji" DreamBooth-keyword?

(Okay, admittedly, you don't need to do this with a locally-trained LoRA. You could also do it by activating the static memoji-style LoRA, and then training to produce a textual-inversion embedding that locates the subject in the memoji LoRA's latent space. But the "hard part" of that is still the training, and it's just as costly!)

2 more replies

j / k navigate · click thread line to collapse

0 comments

derefr1y ago

They didn't say that this is an "AI thing", but I can't honestly see how else you'd do it other than by fine-tuning a vision model on the user's own handwriting.

Hugsun1y ago

I didn't see the presentation but judging by your description, this is achievable using in-context learning.

wmfOP1y ago

For everything other than handwriting I don't think the LoRAs are fine-tuned locally.

derefr1y ago

I can't picture any way to use a RAG to do that.

2 more replies

j / k navigate · click thread line to collapse