undefined | Better HN

0 pointsSushiHippie2y ago0 comments

Not true anymore, but it also highly depends on what your definition of "a good one" is.

Many people find Mistral 7B to be excellent, around gpt-3.5 level of good.

Mistral 7B normally requires like 20gb VRAM, but with llama.cpp and quantization, you could even run it on your phone (albeit bad quality).

Quantization >= q4_K_M seem to provide nearly as good responses as the unquantized model, and q4_K_M only needs ~7GB of VRAM.

See the table here:

https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGU...

Using ollama you can get up and running even a bit faster than with llama.cpp directly (ollama uses llama.cpp under the hood).

0 comments

1 comments · 1 top-level

kouru2252y ago

Oh Jesus so basically it’s very feasible for me to run my own local llm on a NAS or a server or something… well I guess it’s time for me to get on with the times…

Thanks!

1 more reply

j / k navigate · click thread line to collapse