undefined | Better HN

0 pointsrspoerri1y ago0 comments

I'm running 70B models (usually in q4 .. q5_k_m, but possible to q6) on my 96Gbyte Macbook Pro with M2-Max (12 cpu cores, 38 gpu cores). This also leaves me with plenty of ram for other purposes.

I'm currently using reflection:70b_q4 which does a very good job in my opinion. It generates with 5.5 tokens/s for the response, which is just about my reading speed.

edit: I usually dont run larger models (q6) because of the speed. I'd guess a 405B model would just be awfully slow.

0 comments

throwthrowuknow1y ago

Not going to work for training from scratch which is what the author is doing.

rspoerriOP1y ago

192GByte of RAM are not enough to train 405B models. Reflection 70B requires 140GByte of RAM in fp16, 405 would need ~810Gbyte of RAM.

throwthrowuknow1y ago

Pretty sure he said he’s inferencing llama3 405 and training his own custom model from scratch. He didn’t say how big his custom model will be.

j / k navigate · click thread line to collapse

0 pointsrspoerri1y ago0 comments

I'm running 70B models (usually in q4 .. q5_k_m, but possible to q6) on my 96Gbyte Macbook Pro with M2-Max (12 cpu cores, 38 gpu cores). This also leaves me with plenty of ram for other purposes.

I'm currently using reflection:70b_q4 which does a very good job in my opinion. It generates with 5.5 tokens/s for the response, which is just about my reading speed.

edit: I usually dont run larger models (q6) because of the speed. I'd guess a 405B model would just be awfully slow.

0 comments

throwthrowuknow1y ago

Not going to work for training from scratch which is what the author is doing.

rspoerriOP1y ago

192GByte of RAM are not enough to train 405B models. Reflection 70B requires 140GByte of RAM in fp16, 405 would need ~810Gbyte of RAM.

throwthrowuknow1y ago

Pretty sure he said he’s inferencing llama3 405 and training his own custom model from scratch. He didn’t say how big his custom model will be.

j / k navigate · click thread line to collapse