For the second question, yeah exactly, as long as you've trained the rest of the system to a certain degree you can certainly do one-shot training on top of that now already for object recognition for example and you would be able to do it for style acquisition for diffusion models as well soon I think (you can already pretty quickly do overfit training on them, at home, in a couple of minutes with 10-20 images).
Essentially this is what the brain does when you do oneshot learning of traffic signs or characters when learning a new alphabet etc. (yeah sometimes it's not that easy but still it's "theoretically" possible :). The rest of the recognition pipeline is so general that styles and objects etc are just a small icing on the cake to learn on top, you don't need to retrain all the areas of the brain when adding a roadsign to your driving skill set.
But my point was that you could train the rest of the network on more general public data and not greg rutkowski. Hooray. Then someone shows it a single greg image and you're back to square 0.