Per this calculator, for training, only gpt2-large and gpt2-medium would work with those two top-of-the-line GPUs.
For inference it's certainly a bit better, only the Llama-2-70b-hf and Llama-2-13b-hf don't fit in that much VRAM, all the other models do.
Very large models have to be distributed across multiple GPUs though, even if you’re using datacenter chips like H100s.
A single $6800 RTX 6000 Ada with 48GB of VRAM vs 6x 7900XTX with a combined total of 144GB of VRAM honestly makes this seem like a no brainer to me.
nVidias play seems obvious. Game graphics don’t move that fast these days. Used market flush with 3090s and down is fine to them while they focus on extracting top dollar from fast moving AI researchers/VCs
IIRC you want micro-batching though, to overlap pipeline phases.