At $work CGI assets sometimes grow pretty big and throwing more VRAM at the problem would be easier than optimizing the scenes in the middle of the workflow. They can be optimized, but that often makes it less ergonomic to work with them.
Perhaps asset-streaming (nanite&co) will make this less of an issue, but that's also fairly new.
Do LLM implementations already stream the weights layer by layer or in whichever order they're doing the evaluation or is PCIe bandwidth too limited for that?