undefined | Better HN

Better HN

Top Best Ask Show New Jobs

0 pointsterafo3y ago0 comments

As far as I can tell you can do it right now, at least for small 13B model, not sure about bigger models.

0 pointsterafo3y ago0 comments

As far as I can tell you can do it right now, at least for small 13B model, not sure about bigger models.

0 comments

5 comments · 2 top-level

eightysixfour3y ago· 3 in thread

I don't believe they could, need CUDA and more VRAM...

terafoOP3y ago

128 gigs is more than enough to load 13B model into. Pytorch has M1 support for some time now so CUDA isn't required.

eightysixfour3y ago

Does M1 use system memory as VRAM as well?

1 more reply

MacsHeadroom3y ago

13B fits on a single 3090 (24GB) in int8.

128 gigs might even be enough for 65B, if slowly.

TaylorAlexander3y ago

The 13B parameter model is two 13.5GB chunks, but the 7B parameter model is one 13.5GB chunk so that one might be possible.

EDIT: There seems to be a lot of overhead. Here someone struggles to fit the 7B parameter model (13.5GB checkpoint) on a 24GB card: https://github.com/facebookresearch/llama/issues/55

EDIT2: actually using the tip in that link I got it to run on a P5000 with 16GB of ram! It just barely fits so I had to log out of gnome and run it terminal only.

j / k navigate · click thread line to collapse