I don't think I have a core misunderstanding - I've seen the abysmal tps rate that results from being unable to load an entire model in actual system RAM (not swap space) at the same time. No matter how fast your NVME storage sequential read speed is.
Yes doing that would prevent destruction of an SSD through using disk space as swap RAM, but it will not be a good experience or usable at all. Note that the original post I was replying to referenced "swapping" which is generally meant to mean using system swap space as RAM.
The standard term for loading only portions of a model from disk as needed is memory mapping, not "swapping". https://www.google.com/search?client=firefox-b-d&q=llama-ser... , or same thing if you google "safetensors file memory mapping"
With a model of this large of a size, not being able to hold it in RAM? Even at worst quantization you'd be looking at 1tps or worse.