For me, none really, just as a toy. I don't get much use out of online either. There was Kaggle competition to find issues with OpenAI's open weights model, but because my RTX gpu didn't have enough memory i had to run it very slowly from with CPU/ram.
Maybe other people have actual uses, but i don't
No, they can run quantized versions of those models, which are dumber than the base 30b models, which are much dumber than > 400b models (from my use).
> They are a little bit dumber than the big cloud models but not by much.
If this were true, we wouldn't see people paying the premiums for the bigger models (like Claude).
For every use case I've thrown at them, it's not a question of "a little dumber", it's the binary fact that the smaller models are incapable of doing what I need with any sort of consistency, and hallucinate at extreme rates.
What's the actual use case for these local models?
If anyone has a gaming GPU with gobs of VRAM, I highly encourage they experiment with creating long-running local-LLM apps. We need more independent tinkering in this space.