I suppose I could be getting a biased impression though, as of course many more people are in a position to recommend the more accessible models.
What sort of things are you running that take full advantage of that 24GB?
> If you’re interested in ML training
Training - at least the one I tried - requires to be run in fp16 mode. So a 7b net needs 14 GB for the model weights alone, plus some extra for the context and the stuff I don't really understand (some gradient values, oh that makes sense now that I've written it)
(Which is of course how CUDA built its success more generally, vs the "you have to buy the $5k workstation card to get started" strategy from ROCm.)
More generally you'd call this optimization and targeting the hardware that's available. No sense releasing crysis when everyone is running a commodore 64, after all.
AMD and Intel GPUs do not have the software ecosystem for AI workloads that Nvidia does, though AMD is rapidly improving. Nvidia has had an effective monopoly on the AI hardware space for the last year or so, and continues to have an effective near-monopoly, but that won't last forever as AMD and Intel catch up.
The VRAM is one of the largest differentiators of their cards. Sufficient VRAM allows you to run huge LLMs like 65B in-memory, which is orders of magnitudes faster than system RAM + CPU. Smaller amounts of VRAM require swapping between VRAM and system RAM and incur a major performance penalty.
Businesses are fighting to fork over $50k+/card for 40/80GB cards with the same processor as the 24GB consumer cards - it doesn't make economic sense for Nvidia to offer more on the consumer cards, lest they start cannibalizing demand for the enterprise cards.
The real reason is that it doesn't deteriorate with regards to the input length in case of text, or far neighbourhood in case of vision. It's just a universal, new, building block that allows for shallower neural networks to perform more like their bigger versions