LM Studio 0.3 – Discover, download, and run local LLMs (opens in new tab)

(lmstudio.ai)

241 pointsfdb1y ago66 comments

66 comments

61 comments · 21 top-level

If you're hopping between these products instead of learning and understanding how inference works under the hood, and familiarizing yourself with the leading open source projects (i.e. llama.cpp), you are doing yourself a great disservice.

hnuser1234561y ago

I know how training and inference works under the hood, I know the activation functions and backprop and MMUL, and I know some real applications I really want to build. But there's still plenty of room in the gap between that LM studio helps fill. I also already have software built around the openai api, and the lmstudio openai api emulator is hard to beat for convenience. But if you can outline a process I could follow (or link good literature) to shift towards running LLMs locally with FOSS but still interact with them through an API, I'll absolutely give it a try.

BaculumMeumEst1y ago

"hopping between these products instead of learning and understanding" was intended to exclude people who already know how they work, because I think it is totally fine to use them if you know exactly what all the current knobs and levers do.

gastonmorixe1y ago

Have you tried Jan? https://github.com/janhq/jan

1 more reply

barrkel1y ago

Why would someone expect interacting with a local LLM to teach anything about inference?

Interacting with a local LLM develops one's intuitions about how LLMs work, what they're good for (appropriately scaled to model size) and how they break, and gives you ideas about how to use them as a tool in a bigger applications without getting bogged down in API billing etc.

BaculumMeumEst1y ago

Assuming s/would/wouldn't: If you are super smart then perhaps you can intuit details about how they work under the hood. Otherwise you are working with a mental model that is likely to be much more faulty than the one you would develop by learning through study.

1 more reply

2browser1y ago

You can also learn how a user will approach prompting.

m3kw91y ago

Why

washadjeffmad1y ago

It's not that high a bar, and we're still very much publication to implementation. Most recently, I was able to use SAM2, SV3D, Mistral NeMo, and Flux.dev day-one, and I'm certainly not some heady software engineer.

There's just a lot of great stuff you're missing out on if you're waiting on products while ignoring the very accessible, freely available tools they're built on top of and often reductions of.

I'm not against overlays like ollama and lm studio, but I feel more confused by why they exist when there's no additional barrier to going on huggingface or using kcpp, ooba, etc.

I just assume it's an awareness issue, but I'm probably wrong.

ganyu1y ago

While it is most proper and convenient to use these out-of-the-box products for fit scenarios,

Doing so will at the very least not help us with our interviews. It will also restrict our mindset of how one can make use of LLMs through the distraction of sleek, heavily abstracted interfaces. This makes it harder, if not impossible for us to come up with bright new ideas that undermine models in various novel ways, which are almost always derived from deep understanding of how things actually work under the hood.

pcf1y ago· 7 in thread

In some brief testing, I discovered that the same models (Llama 3 7B and one more I can't remember) are running MUCH slower in LM Studio than in Ollama on my MacBook Air M1 2020.

Has anyone found the same thing, or was that a fluke and I should try LM Studio again?

viccis1y ago

Just chiming in with others to help out:

By default LM Studio doesn't fully use your GPU. I have no idea why. Under the settings pane on the right, turn the slider under "GPU Offload" all the way to 100%.

pcf1y ago

That froze the whole computer, and even disabled the possibility of clicking both the internal and external trackpad.

The model is Dolphin 2.9.1 Llama 3 8B Q4_0.

I set it to 100% and wrote this: "hi, which model are you?"

The reply was a slow output of these characters, a mouse cursor that barely moved, and I couldn't click on the trackpads: "G06-5(D&?=4>,.))G?7E-5)GAG+2;BEB,%F=#+="6;?";/H/01#2%4F1"!F#E<6C9+#"5E-<!CGE;>;E(74F=')FE2=HC7#B87!#/C?!?,?-%-09."92G+!>E';'GAF?08<F5<:&%<831578',%9>.='"0&=6225A?.8,#8<H?.'%?)-<0&+,+D+<?0>3/;HG%-=D,+G4.C8#FE<%=4))22'*"EG-0&68</"G%(2("

Help?

cma1y ago

Maybe so the web browser etc. still has some GPU without swapping from main memory? What % does it default to?

Terretta1y ago

Two replies to parent immediately suggest tuning. Ironically, this release claims to feature auto-config for best performance:

“Some of us are well versed in the nitty gritty of LLM load and inference parameters. But many of us, understandably, can't be bothered. LM Studio 0.3.0 auto-configures everything based on the hardware you are running it on.”

So parent should expect it to work.

I find the same issue: using a MBP with 96GB (M2 Max with 38‑core GPU), it seems to tune by default for a base machine.

christkv1y ago

Make sure you turn on the use of the GPU using the slider. By default it does not leverage the full speed.

napier1y ago

Yeah, me. Even without other applications running in the background and without any models loaded, the new 0.3 UI is stuttering and running like a couch-locked crusty after too many edibles on my Macbook Air 2021, 16GB. When I finally get even a 4B model loaded, inference is glacially slow. The previous versions worked just fine (they're still available for download).

smcleod1y ago

Don’t forget to tune your num_batch

qwertox1y ago· 7 in thread

Yesterday I wanted to find a conversation snippet in ChatGPT of a conversation I had maybe 1 or 2 weeks ago. Searching for a single keyword would have been enough to find it.

How is it possible that there's still no way to search through your conversations?

xyc1y ago

Check out https://recurse.chat (I'm the dev). You can import ChatGPT messages. It has almost instant full text search over thousands of chat sessions. Also supports llama.cpp, local embedding / RAG, and most recently bookmarks and nested folders.

code511y ago

For Mac and iOS, you can install ChatGPT app.

Why they won't enable search for their main web user crowd is beyond me.

Perhaps they are just afraid of scale. With all their might, it's still possible that they can't estimate the scale and complexity of queries they might receive.

xyc1y ago

A user's personal data really does not have that much scale. Worst case they can cache everything locally. I've imported thousands of chat sessions into a local AI chat app's database, total storage is under 30MB. Full text search (with highlights and all) is almost instant.

nilsherzig1y ago

They did staged rollouts for almost every recent feature.

I think it might be in their interest if you just ask the LLM again? Old answers might not be up to their current standards and they don't gain feedback from you looking at old answers

BaculumMeumEst1y ago

There are lots of ways to search through your conversations, just not through OpenAI's web interface. If you don't want to explore alternatives because you don't want to lose access to your conversations, I would argue you've just demonstrated to yourself why you should avoid proactively avoid vendor lock-in.

1 more reply

Jedd1y ago

Are you complaining about OpenAI's ChatGPT's web UI interface?

potatoman221y ago

Try exporting your data and searching the JSON/HTML.

smcleod1y ago· 3 in thread

Nice, it’s a solid product! It’s just a shame it’s not open source and its license doesn’t permit work use.

yags1y ago

Thanks! We actually totally permit work use. See https://lmstudio.ai/enterprise.html

jdboyd1y ago

An email us link is a bit discouragement for using it work purposes. I want a clearly defined price list, at least for some entry levels of commercial use.

1 more reply

smcleod1y ago

Thanks, what license is it under? This means that anyone that wants to try it at work has to fill that out though right?

fallinditch1y ago· 3 in thread

Does anyone know what advantages LM Studio has over Ollama, and vise versa?

vunderba1y ago

A better question would be over something like Jan or LibreChat. Ollama's is CLI/API/backend for easily downloading and running models.

https://github.com/janhq/jan

https://github.com/danny-avila/LibreChat

Jan's probably the closest thing to a open-source LLM chat interface that is relatively easy to get started with.

I personally prefer Librechat (which supports integration with image generation) but it does have to spin up some docker stuff and that can make it a bit more complicated.

himhckr1y ago

There is also Msty (https://msty.app), which I find much easier to get started with and it comes with interesting features such as web search, RAG, Delve mode, etc.

barrkel1y ago

Ollama doesn't have a UI.

webprofusion1y ago· 3 in thread

Cool, it's a bit weird that the Windows download is 32-bit, it should be 64-bit by default and there's no need for a 32-bit windows version at all.

webprofusion1y ago

It's probably 64-bit and they just call it x86 on their website. Needs an option to choose where models get downloaded to as your typically C: drive is an SSD with limited space.

diggan1y ago

> Needs an option to choose where models get downloaded to as your typically C: drive is an SSD with limited space.

You can already do this? https://i.imgur.com/BpF3K9t.png

Jedd1y ago

Has an option to choose where models get downloaded - in the Models tab you can pick the target path.

alok-g1y ago· 2 in thread