You are correct that inference speed per $ is not optimized with this purchase.
What is optimized is the ability to find tune medium size models (~200GB) / $
You just can't get 500GB of VRAM for less than $100k. Even with $9k Blackwell cards, you have $10k in a barebones GPU server. You can't use commodity hardware and cluster it because you need fast interconnects. I'm talking 200-400GB/s interconnects. And those take yet another PCIe slot and require expensive Infiniband switches.
Shit gets costly fast. I consternated about this purchase for weeks. Eventually deciding that it's the easiest path to success for my purposes. Not for everyone's, but for mine.