>
Doing everything on-device would result in a horrible user experience. They might as well not participate in this generative AI rush at all if they hoped to keep it on-device.On the contrary, I'm shocked over the last few months how "on device" on a Macbook Pro or Mac Studio competes plausibly with last year's early GPT-4, leveraging Llama 3 70b or Qwen2 72b.
There are surprisingly few things you "need" 128GB of so-called "unified RAM" for, but with M-series processors and the memory bandwidth, this is a use case that shines.
From this thread covering performance of llama.cpp on Apple Silicon M-series …
https://github.com/ggerganov/llama.cpp/discussions/4167
… "Buy as much memory as you can afford would be my bottom line!"