So, this is a bit misleading. For whatever reason the models tend to be released in certain parameter sizes. 7B models are popular. The next highest is 13B. There are few in between (some 11B). Likewise the jump from 13 is straight to 33B. You
can run finetunes of a 33B model that have been cut down a little and fit them in a 24GB card. Likewise those 13B models running on 16GB cards have a lot of head room. You don't
need to run as cut down a model, and you can run it with more context (i.e. the amount of your chat it can hold in memory)
I hope that helps, it's not 1:1, and it's a bit confusing