undefined | Better HN

0 pointsCasteil2y ago0 comments

70b is probably going to be a bit slow for most on M-series MBPs (even with enough RAM), but Mixtral 8x7b does really well. Very usable @ 25-30T/s (64GB M1 Max), whereas 70b tends to run more like 3.5-5T/s.

'llama.cpp-based' generally seems like the norm.

Ollama is just really easy to set up & get going on MacOS. Integral support like this means one less thing to wire up or worry about when using a local LLM as a drop-in replacement for OpenAI's remote API. Ollama also has a model library[1] you can browse & easily retrieve models from.

Another project, Ollama-webui[2] is a nice webui/frontend for local LLM models in Ollama - it supports the latest LLaVA for multimodal image/prompt input, too.

[1] https://ollama.ai/library/mixtral

[2] https://github.com/ollama-webui/ollama-webui

0 comments

2 comments · 1 top-level

visarga2y ago· 1 in thread

Yeah, ollama-webui is an excellent front end and the team was responsive in fixing a bug I reported in a couple of days

It's also possible to connect to OpenAI API and use GPT-4 on per token plan. I cancelled my chatGPT subscription since. But 90% of the usage for me is Mistral 7B fine-tunes, I rarely use OpenAI

mark_l_watson2y ago

Thanks for that idea, I use Ollama as my main LLM driver, but I still use OpenAI, Anthropic, and Mistral commercial API plans. I access Ollama via a REST API and my own client code, but I will try their UI.

re: cancelling ChatGPT subscription: I am tempted to do this also except I suspect that when they release GPT-5 there may be a waiting list, and I don’t want any delays in trying it out.

j / k navigate · click thread line to collapse