Doesn’t seem to be any code or runtime examples
Of course there is no 100kb text-to-image model
“This allows runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art.”
https://arxiv.org/abs/2305.01644
It's one of those sentences that if you know what it means, you know what it means. That said, the title needs the word "personalization" inserted before the word model, e.g.:
Nvidia intros 100kb text-to-image personalization model called Perfusion
This is probably what future voice models will begin to look like as they begin to capture prosody and other fine characteristics in a few hundred kb.
When you train a model with new inputs to fine tune you can save the weights that got changed to a separate file instead of the main file.
In other words one can see the small tuning models as selectively to be applied updates/patches.