undefined | Better HN

Skip to content

Top Best Ask Show New Jobs

0 pointsmsp262y ago0 comments

"We have also trained 34B variants, which we report on in this paper but are not releasing."

"We are delaying the release of the 34B model due to a lack of time to sufficiently red team."

From the Llama 2 paper

0 comments

6 comments · 1 top-level

swyx2y ago· 5 in thread

if you red team the 13b and the 70b and they pass, what is the danger of 34B being significantly more dangerous?

edit: turns out I should RTFP. there was a ~2x spike in safety violations for 34B https://twitter.com/yacineMTB/status/1681358362057883680?s=2...

DebtDeflation2y ago

A 34B model is probably about the largest you can run on a consumer GPU with 24GB VRAM. 70B will require A100's or a cloud host. 13B models are everywhere already. I'm sure this was a very deliberate choice - let people play with the 13B model locally to whet their appetite and then they can pay to run the 70B model on Azure.

bloaf2y ago

I'm running a 30B model on an amd 5600x cpu at 2-3 tokens/s, which is just under a "read-aloud" pace. I'd wager that you can run a 70B model at about the same speed with a 7900x and a bit more RAM.

fmajid2y ago

Or a $5000 128GB Mac Studio, that you can get for 1/2 the price of a 40GB A100 or 1/7 the price of a 80GB H100.

brucethemoose22y ago

70B should work reasonably well on 64GB CPU RAM + any decent GPU, or maybe a 24GB GPU + 32GB CPU RAM.

swyx2y ago

do you mind teaching how to do CPU/GPU RAM math? all i know is 34B 16bit = 68GB total RAM needed (because 1B of 8bytes = 1GB definitionally), but i dont know how it splits between CPU/GPU and whether the tradeoff in tok/s is acceptable

j / k navigate · click thread line to collapse