Ollama Turbo (opens in new tab)

(ollama.com)

430 pointsamram_art10mo ago243 comments

243 comments

172 comments · 41 top-level

smlacy10mo ago· 23 in thread

Watching ollama pivot from a somewhat scrappy yet amazingly important and well designed open source project to a regular "for-profit company" is going to be sad.

Thankfully, this may just leave more room for other open source local inference engines.

mchiang10mo ago

we have always been building in the open, and so is Ollama. All the core pieces of Ollama are open. There are areas where we want to be opinionated on the design to build the world we want to see.

There are areas we will make money, and I wholly believe if we follow our conscious we can create something amazing for the world while making sure we can keep it fueled to keep it going for the long term.

Some of the ideas in Turbo mode (completely optional) is to serve the users who want a faster GPU, and adding in additional capabilities like web search. We loved the experience so much that we decided to give web search to non-paid users too. (Again, it's fully optional). Now to prevent abuse and make sure our costs don't go out of hand, we require login.

Can't we all just work together and create a better world? Or does it have to be so zero sum?

xiphias210mo ago

I wanted to try web search to increase my privacy but it wanted to do login.

For Turbo mode I understand the need for paying but the main poing of running a local model with web search is browsing from my computer without using any LLM provider. Also I want to get rid of the latency to US servers from Europe.

If ollama can't do it, maybe a fork.

1 more reply

dcreater10mo ago

I'm sorry but your words don't match your actions.

shepardrtc10mo ago

I think this offering is a perfectly reasonable option for them to make money. We all have bills to pay, and this isn't interfering with their open source project, so I don't see anything wrong with it.

Aeolun10mo ago

> this isn't interfering with their open source project

Wait until it makes significant amounts of money. Suddenly the priorities will be different.

I don’t begrudge them wanting to make some money off it though.

1 more reply

smeeth10mo ago

Their FOSS local inference service didn't go anywhere.

This isn't Anaconda, they didn't do a bait and switch to screw their core users. It isn't sinful for devs to try and earn a living.

kermatt10mo ago

Another perspective:

If you earn a living using something someone else built, and expect them not to earn a living, your paycheck has a limited lifetime.

“Someone” in this context could be a person, a team, or a corporate entity. Free may be temporary.

blitzar10mo ago

Yet. Their FOSS local inference service hasn't go anywhere ... yet.

dcreater10mo ago

You can build this and go build something else as well. You don't need to morph the thing you built. That's underhanded

TuringNYC10mo ago

>> Watching ollama pivot from a somewhat scrappy yet amazingly important and well designed open source project to a regular "for-profit company" is going to be sad.

if i could have consistent and seamless local-cloud dev that would be a nice win. everyone has to write things 3x over these days depending on your garden of choice, even with langchain/llamaindex

mark_l_watson10mo ago

I don't blame them. As soon as they offer a few more models available with the Turbo mode I plan on subscribing to their Turbo plan for a couple of months - a buying them a coffee, or keeping the lights on kind of thing.

The Ollama app using the signed-in-only web search tool is really pretty good.

satvikpendem10mo ago

> important and well designed open source project

It was always just a wrapper around the real well designed OSS, llama.cpp. Ollama even messes up the names of models by calling distilled models the name of the actual one, such as DeepSeek.

Ollama's engineers created Docker Desktop, and you can see how that turned out, so I don't have much faith in them to continue to stay open given what a rugpull Docker Desktop became.

Philpax10mo ago

I wouldn't go as far as to say that llama.cpp is "well designed" (there be demons there), but I otherwise agree with the sentiment.

user-10mo ago

I remember them pivoting from being infra.hq

dangoodmanUT10mo ago

It was always a company

mythz10mo ago

Same, was just after a small lightweight solution where I can download, manage and run local models. Really not a fan of boarding the enshittification train ride with them.

Always had a bad feeling when they didn't give ggerganov/llama.cpp their deserved credit for making Ollama possible in the first place, if it were a true OSS project they would have, but now makes more sense through the lens of a VC-funded project looking to grab as much marketshare as possible to avoid raising awareness for alternatives in OSS projects they depend on.

Together with their new closed-source UI [1] it's time for me to switch back to llama.cpp's cli/server.

[1] https://www.reddit.com/r/LocalLLaMA/comments/1meeyee/ollamas...

colesantiago10mo ago

ollama is YC and VC backed, this was inevitable and not surprising.

All companies that raise outside investment follow this route.

No exceptions.

And yes this is how ollama will fall due to enshittification, for lack of a better word.

otabdeveloper410mo ago

[flagged]

dang10mo ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

1 more reply

api10mo ago

> Repackaging existing software while literally adding no useful functionality was always their gig.

Developers continue to be blind to usability and UI/UX. Ollama lets you just install it, just install models, and go. The only other thing really like that is LM-Studio.

It's not surprising that the people behind it are Docker people. Yes you can do everything Docker does with Linux kernel and shell commands, but do you want to?

Making software usable is often many orders of magnitude more work than making software work.

1 more reply

llmtosser10mo ago

This is not true.

No inference engine does all of:

- Model switching

- Unload after idle

- Dynamic layer offload to CPU to avoid OOM

1 more reply

mchiang10mo ago

sorry that you feel the way you feel. :(

I'm not sure which package we use that is triggering this. My guess is llama.cpp based on what I see on social? Ollama has long shifted to using our own engine. We do use llama.cpp for legacy and backwards compatibility. I want to be clear it's not a knock on the llama.cpp project either.

There are certain features we want to build into Ollama, and we want to be opinionated on the experience we want to build.

Have you supported our past gigs before? Why not be more happy and optimistic in seeing everyone build their dreams (success or not).

If you go build a project of your dreams, I'd be supportive of it too.

1 more reply

dangoodmanUT10mo ago

Yes everyone should just write cpp to call local LLMs obviously

1 more reply

dcreater10mo ago· 22 in thread

Called it.

It's very unfortunate that the local inference community has aggregated around Ollama when it's clear that's not their long term priority or strategy.

Its imperative we move away ASAP

tarruda10mo ago

Llama.cpp (library which ollama uses under the hoods) has its own server, and it is fully compatible with open-webui.

I moved away from ollama in favor of llama-server a couple of months ago and never missed anything, since I'm still using the same UI.

mchiang10mo ago

totally respect your choice, and it's a great project too. Of course as a maintainer of Ollama, my preference is to win you over with Ollama. If it doesn't meet your needs, it's okay. We are more energized than ever to keep improving Ollama. Hopefully one day we will win you back.

Ollama does not use llama.cpp anymore; we do still keep it and occasionally update it to remain compatible for older models for when we used it. The team is great, we just have features we want to build, and want to implement the models directly in Ollama. (We do use GGML and ask partners to help it. This is a project that also powers llama.cpp and is maintained by that same team)

4 more replies

halJordan10mo ago

Fully compatible is a stretch, it's important we dont fall into a celebrity "my guy is perfect" trap. They implement a few endpoints.

1 more reply

benreesman10mo ago

I won't use `ollama` on principle. I use `llama-cli` and `llama-server` if I'm not linking `ggml`/`gguf` directly. It's like, two extra commands to use the one by the genius that wrote it and not the one that the guys just jacked it.

The models are on HuggingFace and downloading them is `uvx huggingface-cli`, the `GGUF` quants were `TheBloke` (with a grant from pmarca IIRC) for ages and now everyone does them (`unsloth` does a bunch of them).

Maybe I've got it twisted, but it seems to be that the people who actually do `ggml` aren't happy about it, and I've got their back on this.

om810mo ago

It’s unfortunate that llama.cpp’s code is a mess. It’s impossible to make any meaningful contributions to it.

1 more reply

A4ET8a8uTh0_v210mo ago

Interesting, admittedly, I am slowly getting to the point, where ollama's defaults get a little restrictive. If the setup is not too onerous, I would not mind trying. Where did you start?

1 more reply

theshrike7910mo ago

Isn't the open-webui maintainer heavily against MCP support and tool calling?

mchiang10mo ago

hmm, how so? Ollama is open and the pricing is completely optional for users who want additional GPUs.

Is it bad to fairly charge money for selling GPUs that cost us money too, and use that money to grow the core open-source project?

At one point, it just has to be reasonable. I'd like to believe by having a conscientious, we can create something great.

dcreater10mo ago

First, I must say I appreciate you taking the time to be engaged on this thread and responding to so many of us.

What I'm referring to is a broader pattern that I (and several) others have been seeing. Of the top of my head: not crediting llama.cpp previously, still not crediting llama.cpp now and saying you are using your own inference engine when you are still using ggml and the core of what Georgi made, most importantly why even create your own version - is it not better for the community to just contribute to llama.cpp?, making your own propreitary model storage platform disallowing using weights with other local engines requiring people to duplicate downloads and more.

I dont know how to regard these other than being largely motivated out of self interest.

I think what Jeff and you have built have been enormously helpful to us - Ollama is how I got started running models locally and have enjoyed using it for years now. For that, I think you guys should be paid millions. But what I fear is going to happen is you guys will go the way of the current dogma of capturing users (at least in mindshare) and then continually squeezing more. I would love to be wrong, but I am not going to stick around to find out as its risk I cannot take.

tomrod10mo ago

Everyone just wants to solarpunk this up.

1 more reply

sitkack10mo ago

I believe that is what https://github.com/containers/ramalama set out to do.

janalsncm10mo ago

Huggingface also offers a cloud product, but that doesn’t take away from downloading weights and running them locally.

idiotsecant10mo ago

Oh no this is a positively diabolical development, offering...hosting services tailored to a specific use case at a reasonable price ...

SV_BubbleTime10mo ago

They can’t keep getting away with this.

mrcwinn10mo ago

Yes, better to get free sh*t unsustainably. By the way, you're free to create an open source alternative and pour your time into that so we can all benefit. But when you don't — remember I called it!

rpdillon10mo ago

What? The obvious move is to never have switched to Ollama and just use Llama.cpp directly, which I've been doing for years. Llama.cpp was created first, is the foundation for this product, and is actually open source.

1 more reply

Aurornis10mo ago

> Its imperative we move away ASAP

Why? If the tool works then use it. They’re not forcing you to use the cloud.

dcreater10mo ago

There are many, many FOSS apps that use Ollama as a dependency. If Ollama rugs, then all those projects suffer.

Its a tale we seen played out many times. Redis is the most recent example.

1 more reply

prettyblocks10mo ago

Local inference is becoming completely commoditized imo. These days even docker has a local models you can launch with a single click (or command).

fud10110mo ago

i was trying to remove it but noticed they've hidden the uninstall away. It amounts to doing a rm - which is a joke.

jcelerier10mo ago

happy sglang user here :)

cchance10mo ago

I stopped using them when they started doing the weird model naming bullshit stuck with lmstudio since

jacekm10mo ago· 13 in thread

What could be the benefit of paying $20 to Ollama to run inferior models instead of paying the same amount of money to e.g. OpenAI for access to sota models?

daft_pink10mo ago

I feel the primary benefit of this Ollama Turbo is that you can quickly test and run different models in the cloud that you could run locally if you had the correct hardware.

This allows you to try out some open models and better assess if you could buy a dgx box or Mac Studio with a lot of unified memory and build out what you want to do locally without actually investing in very expensive hardware.

Certain applications require good privacy control and on-prem and local are something certain financial/medical/law developers want. This allows you to build something and test it on non-private data and then drop in real local hardware later in the process.

jerieljan10mo ago

> quickly test and run different models in the cloud that you could run locally if you had the correct hardware.

I feel like they're competing against Hugging Face or even Colaboratory then if this is the case.

And for cases that require strict privacy control, I don't think I'd run it on emergent models or if I really have to, I would prefer doing so on an existing cloud setup already that has the necessary trust / compliance barriers addressed. (does Ollama Turbo even have their Trust center up?)

I can see its potential once it gets rolling, since there's a lot of ollama installations out there.

fluidcruft10mo ago

Me at home: $20/mo while I wait for a card that can run this or dgx box? Decisions, decisions.

dawnerd10mo ago

Quickly test… the two models they support? This is just another subscription to quantized models.

1 more reply

rapind10mo ago

I'm not sure the major models will remain at $20. Regardless, I support any and all efforts to keep the space crowded and competitive.

adrr10mo ago

Running models without a filter on it. OpenAI has an overzealous filter and won’t even tell you what you violated. So you have to do a dance with prompts to see if it’s copyright, trademark or whatever. Recently it just refused to answer my questions and said it wasn’t true that a civil servant would get fired for releasing a report per their job duties. Another dance sending it links to stories that it was true so it could answer my question. I want a LLMs without training wheels.

michelsedgh10mo ago

I think its the data privacy is the main point and probably more usage before you hit limits? But mainly data privacy i guess

ibejoeb10mo ago

I run a lot of mundane jobs that work fine with less capable models, so I can see the potential benefit. It all depends on the limits though.

_--__--__10mo ago

Groq seems to do okay with a similar service but I think their pricing is probably better.

woadwarrior0110mo ago

Groq's moat is speed, using their custom hardware.

Geezus_4210mo ago

Yeah, the NAZI sex not will be great for business!

2 more replies

AndroTux10mo ago

Privacy, I guess. But at this point it’s just believing that they won’t log your data.

vanillax10mo ago

nothing lmao. this is just ollama trying to make money.

jnmandal10mo ago· 10 in thread

I see a lot of hate for ollama doing this kind of thing but also they remain one of the easiest to use solutions for developing and testing against a model locally.

Sure, llama.cpp is the real thing, ollama is a wrapper... I would never want to use something like ollama in a production setting. But if I want to quickly get someone less technical up to speed to develop an LLM-enabled system and run qwen or w/e locally, well then its pretty nice that they have a GUI and a .dmg to install.

mchiang10mo ago

Thanks for the kind words.

Since the new multimodal engine, Ollama has moved off of llama.cpp as a wrapper. We do continue to use the GGML library, and ask hardware partners to help optimize it.

Ollama might look like a toy and what looks trivial to build. I can say, to keep its simplicity, we go through a deep amount of struggles to make it work with the experience we want.

Simplicity is often overlooked, but we want to build the world we want to see.

dcreater10mo ago

But Ollama is a toy, it's meaningful for hobbyists and individuals to use locally like myself. Why would it be the right choice for anything more? AWS, vLLM, SGLang etc would be the solutions for enterprise

I knew a startup that deployed ollama on a customers premises and when I asked them why, they had absolutely no good reason. Likely they did it because it was easy. That's not the "easy to use" case you want to solve for.

3 more replies

leopoldj10mo ago

> Ollama has moved off of llama.cpp as a wrapper. We do continue to use the GGML library

Where can I learn more about this? llama.cpp is an inference application built using the ggml library. Does this mean, Ollama now has it's own code for what llama.cpp does?

1 more reply

buyucu10mo ago

This kind of gaslighting is exactly why I stopped using Ollama.

GGML library is llama.cpp. They are one and the same.

Ollama made sense when llama.cpp was hard to use. Ollama does not have value preposition anymore.

2 more replies

steren10mo ago

> I would never want to use something like ollama in a production setting.

We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.

ekianjo10mo ago

you need to benchmark against llama.cpp as well.

apitman10mo ago

Did you test multi-user cases?

1 more reply

sbinnee10mo ago

vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine.

romperstomper10mo ago

It is weird but when I tried new gpt-oss:b20 model locally llama.cpp just failed instantly for me. At the same time under ollama it worked (very slow but anyway). I didn't find how to deal with llama.cpp but ollama definitely doing something under the hood to make models work.

miki12321110mo ago

> I would never want to use something like ollama in a production setting

If you can't get access to "real" datacenter GPUs for any reason and essentially do desktop, clientside deploys, it's your best bet.

It's not a common scenario, but a desktop with a 4090 or two is all you can get in some organizations.

moralestapia10mo ago· 9 in thread

Ollama is great but I feel like Georgi Gerganov deserves way more credit for llama.cpp.

He (almost) single-handedly brought LLMs to the masses.

With the latest news of some AI engineers' compensation reaching up to a billion dollars, feels a bit unfair that Georgi is not getting a much larger slice of the pie.

mrs696910mo ago

Agreed. Ollama itself is kind a wrapper around llamacpp anyway. Feel like the real guy is not included to the process.

Now I am going to go and write a wrapper around llamacpp, that is only open source, truly local.

How can I trust ollama to not to sell my data.

Patrick_Devine10mo ago

Ollama only uses llamacpp for running legacy models. gpt-oss runs entirely in the ollama engine.

You don't need to use Turbo mode; it's just there for people who don't have capable enough GPUs.

rafram10mo ago

Ollama is not a wrapper around llama.cpp anymore, at least for multimodal models (not sure about others). They have their own engine: https://ollama.com/blog/multimodal-models

1 more reply

benreesman10mo ago

`ggerganov` is one of the most under-rated and under-appreciated hackers maybe ever. His name belongs next to like Carmack and other people who made a new thing happen on PCs. And don't forget the shout out to `TheBloke` who like single-handedly bootstrapped the GGUF ecosystem of useful model quants (I think he had a grant from pmarca or something like that, so props to that too).

freedomben10mo ago

Is Georgi landing any of those big-time money jobs? I could see a conflict-of-interest given his involvment with llama.cpp, but I would think he'd be well positioned for something like that

apwell2310mo ago

https://ggml.ai/

> ggml.ai is a company founded by Georgi Gerganov to support the development of ggml. Nat Friedman and Daniel Gross provided the pre-seed funding.

moralestapia10mo ago

(This is mere speculation)

I think he's happy doing his own thing.

But then, if someone came in with a billion ... who wouldn't give it a thought?

1 more reply

am17an10mo ago

Seriously, people astroturfing this thread by saying ollama has a new engine. It literally is the same engine that llama.cpp uses and georgi and slaren maintain! VC funding will make people so dishonest and just plain grifters

guipsp10mo ago

No one is astroturfing. You cannot run any model with just GGML. It's a tensor library. Yes, it adds value, but I don't think that saying that ollama also does is unfair.

satellite210mo ago· 7 in thread

"All hardware is located in the United States."

If I use local/OSS models it's specifically to avoid running in a country with no data protection laws. It's a big close miss here.

bangaladore10mo ago

I think what matters more here is "All hardware is located outside of China". Located in the US means little because that's not good enough for many regulated industries even within the US.

All things considered though, Europe is getting confusing. They have GDPR but now pushing to backdoor encryption within the EU? [1]

At least there isn't a strong movement in the US trying to outlaw E2E encryption.

[1] https://www.eff.org/deeplinks/2025/06/eus-encryption-roadmap...

Which brings up the point are truly private LLMs possible? Where the input I provide is only meaningful to me, but the LLM can still transform it without gaining any contextual value out of it? Without sharing a key? If this can be done, can it be done performantly?

blitzar10mo ago

I would feel safer if the hardware was located in China than in the US.

bangaladore10mo ago

Maybe I hit a nerve with the EU part? I thought it was a fair observation, but I'm open to being corrected if there's more nuance I missed.

1 more reply

wkat424210mo ago

Even the backdoor is an American lobby. Ashton Kutcher and Demi Moore's Thorn.

impulser_10mo ago

Then don't use it and keep using models locally?

riazrizvi10mo ago

No I think the point is to choose the best jurisdiction to have cloud hosted data where your data is best protected from access by very wealthy entities via intelligence services bribery. That’s still hands down the USA.

pphysch10mo ago

Any evidence for this claim that e.g. Mossad has less penetration into digital systems of USA than it does RF or PRC?

1 more reply

polarbear6710mo ago· 7 in thread

Why does everything AI-related have to be $20? Why can't there be tiers? OpenAI setting the standard of $20/m for every AI application is one of the worst things to ever happen.

paxys10mo ago

https://openai.com/chatgpt/pricing/ - $0 / $20 / $200 / $25 (team) / custom enterprise pricing / on-demand API pricing

https://www.anthropic.com/pricing - $0 / $17 (if billed annually) / $20 (if billed monthly) / $100 / $25 (team) / custom enterprise pricing / on-demand API pricing

Sounds like tiers to me.

polarbear6710mo ago

I should have specified less expensive tiers (below the $20 standard). A tier <= $10 would be great. Anything over $10 for casual use seems excessive (or at least from my perspective)

colesantiago10mo ago

Tokens are expensive and nobody is making any money.

senectus110mo ago

yep. this is the 2nd half of why the AI bubble is going to pop.

thimabi10mo ago

My guess is that’s the lowest price point that provides a modicum of profitability — LLMs are quite expensive to run, and even more so for providers like Ollama, which are entering the market and don’t have idle capacity.

furyofantares10mo ago

Claude has $20, $100 and $200, ChatGPT $20, and $200, Google has $20 and $250. Those all have free tiers as well, and metered APIs. Grok has $30 and $300 it looks like, the list probably goes on and on.

joecot10mo ago

I strongly recommend together.ai, which allows you to use a lot of different open source models and charges for usage, not a monthly fee.

liuliu10mo ago· 6 in thread

Any more information on "Privacy first"? It seems pretty thin if just not retaining data.

For Draw Things provided "Cloud Compute", we don't retain any data too (everything is done in RAM per request). But that is still unsatisfactory personally. We will soon add "privacy pass" support, but still not to the satisfactory. Transparency log that can be attested on the hardware would be nice (since we run our open-source gRPCServerCLI too), but I just don't know where to start.

pagekicker10mo ago

I see no privacy advantage to working with Ollama, which can sell your data or have it subpoenaed just like anyone else.

liuliu10mo ago

In theory, "privacy pass" should help, as you can subpoena content, but cannot know who made these. But that is still thin (and Ollama not doing that too anyway).

jmort10mo ago

I don't see a privacy policy and their desktop app is closed source. So, not encouraging.

[full disclosure I am working on something with actual privacy guarantees for LLM calls that does use a transparency log, etc.]

pbronez10mo ago

I’d love to learn more about your project. I’m using socialized cloud regions for AI security and they really lag the mainstream. Definitely need more options here.

Edit: emailed the address on the site in your profile, got an inbox does not exist error.

pogue10mo ago

I would pay more if they let you run the models in Switzerland or some other GDPR respecting country, even if there was extra latency. I would also hope everything is being sent over SSL or something similar.

seanmcdirmid10mo ago

I had to do a double take here. Switzerland surely isn’t in the GDPR, so you mean their own privacy laws or GDPR in the EU?

jasonjmcghee10mo ago· 5 in thread

Interested to see how this plays out - I feel like Ollama is synonymous with "local".

Aurornis10mo ago

There's a small but vocal minority of users who don't trust big companies, but don't mind paying small companies for a similar service.

I'm also interested to see if that small minority of people are willing to pay for a service like this.

jillesvangurp10mo ago

The issue is not companies but governance. OSS licenses and companies are fine. Companies have a natural conflict of interest that can lead them to take software projects they control in a direction that suits their revenue goals but not necessarily the needs/wants of its users. That happens over and over again. It's their nature. This can means changes in direction/focus or worst case license changes that limit what you can do.

The solution is having proper governance for OSS projects that matter with independent organizations made up of developers, companies, and users taking care of the governance. A lot of projects that have that have last for decades and will likely survive for decades more.

And part of that solution is to also steer clear of projects without that. I've been burned a couple of times now getting stuck with OSS components where the license was changed and the companies behind it had their little IPOs and started serving share holders instead of users (elastic, redis, mongo, etc). I only briefly used Mongo and I got a whiff of where things were going and just cut loose from it. With Elastic the license shenenigans started shortly after their IPO and things have been very disruptive to the community (with half using Opensearch now). With Redis I planned the switch to Valkey the second it was announced. Clear cut case of cutting loose. Valkey looks like it has proper governance. Redis never had that.

Ollama seems relatively OK by this benchmark. The software (ollama server) is MIT licensed and there appears to be no contributor license agreement in place. But it's a small group of people that do most of the coding and they all work for the same vc funded company behind ollama. That's not proper governance. They could fail. They could relicense. They could decide that they don't like open source after all. Etc. Worth considering before you bet your company on making this a foundational piece of your tech stack.

recursivegirth10mo ago

Ollama, run by Facebook. Small company, huh.

1 more reply

threetonesun10mo ago

I view it a bit like I do cloud gaming, 90% of the time I'm fine with local use, but sometimes it's just more cost effective to offload the cost of hardware to someone else. But it's not an all-or-nothing decision.

theshrike7910mo ago

Yep, if you just want to play one or two games at 4k HDR etc. it's a lot cheaper to pay 22€ for GeForce Now Ultimate vs. getting a whole-ass gaming PC capable of the same.

decide100010mo ago· 5 in thread

It was fun because it was open. Now it's just another brand seeking dollars.

mchiang10mo ago

Ollama at its core will always be open. Not all users have the computer to run models locally, and it is only fair if we provide GPUs that cost us money and let the users who optionally want it to pay for it.

ciaranmca10mo ago

I think it’s the logical move to ensure Ollama can continue to fund development. I think you will probably end up having to add more tiers or some way for users to buy more credits/gpu time. See anthropic’s recent move with Claude code due to the usage of a number of 24/7 users.

thimabi10mo ago

I’m not throwing the towel on Ollama yet. They do need dollars to operate, but still provide excellent software for running models locally and without paying them a dime.

recursivegirth10mo ago

^ this. As a developer, Ollama has been my go-to for serving offline models. I then use cloudflare tunnels to make them available where I need them.

DiabloD310mo ago

Although it is open, its really just all code borrowed from llama.cpp.

If you want to see where the actual developers do the actual hard work, go use llama.cpp instead.

extr10mo ago· 4 in thread

Nice release. Part of the problem right now with OSS models (at least for enterprise users) is the diversity of offerings in terms of:

- Speed

- Cost

- Reliability

- Feature Parity (eg: context caching)

- Performance (What quant level is being used...really?)

- Host region/data privacy guarantees

- LTS

And that's not even including the decision of what model you want to use!

Realistically if you want to use an OSS model instead of the big 3, you're faced with evalutating models/providers across all these axes, which can require a fair amount of expertise to discern. You may even have to write your own custom evaluations. Meanwhile Anthropic/OAI/Google "just work" and you get what it says on the tin, to the best of their ability. Even if they're more expensive (and they're not that much more expensive), you are basically paying for the priviledge of "we'll handle everything for you".

I think until providers start standardizing OSS offerings, we're going to continue to exist in this in-between world where OSS models theoretically are at performance parity with closed source, but in practice aren't really even in the running for serious large scale deployments.

coderatlarge10mo ago

true but ignores handing over all your prompt traffic without any real legal protections as sama has pointed out:

[1] https://californiarecorder.com/sam-altman-requires-ai-privil...

I_am_tiberius10mo ago

I wouldn't be surprised if those undeleted chats or some inferred data that is based on it is part of the gpt-5 training data. Somehow I don't trust this sama guy at all.

supermatt10mo ago

> OpenAI confirmed it has been preserving deleted and non permanent person chat logs since mid-Might 2025 in response to a federal court docket order

> The order, embedded under and issued on Might 13, 2025, by U.S. Justice of the Peace Decide Ona T. Wang

Is this some meme where “may” is being replaced with “might”, or some word substitution gone awry? I don’t get it.

5 more replies

wkat424210mo ago

Gpt-oss comes only in 4.5 bit quant. This is the native model, so there's no fp16 original

timmg10mo ago· 3 in thread

It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.

I pay $20 to Anthropic, so I don’t think I’d get enough use out of this for the $20 fee. But being able to spin up any of these models and use as needed (and compare) seems extremely useful to me.

I hope this works out well for the team.

ac2910mo ago

> It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.

Agreed, though there are already several providers of these new OpenAI models available, so I'm not sure what ollama's value add is there (there are plenty of good chat/code/etc interfaces available if you are bringing your own API keys).

wongarsu10mo ago

A flat fee service for open-source LLMs is somewhat unique, even if I don't see myself paying for it.

Usage-based pricing would put them in competition with established services like deepinfra.com, novita.ai, and ultimately openrouter.ai. They would go in with more name-recognition, but the established competition is already very competitive on pricing

Aeolun10mo ago

I mean $20/month for API access is definitely new.

captainregex10mo ago· 2 in thread

I am so so so confused as to why Ollama of all companies did this other than an emblematic stab at making money-perhaps to appease someone putting pressure on them to do so. Their stuff does a wonderful job of enabling local for those who want it. So many things to explore there but instead they stand up yet another cloud thing? Love Ollama and hope it stays awesome

janalsncm10mo ago

The problem is that OSS is free to use but it is not free to create or maintain. If you want it to remain free to use and also up to date, Ollama will need someone to address issues on GitHub. Usually people want to be paid money for that.

captainregex10mo ago

money is great! I like money! but if this is their version of buy me a coffee I think there’s room to run elsewhere for their skillset/area of expertise

1 more reply

turnsout10mo ago· 2 in thread

Man, busy day in the world of AI announcements! This looks coordinated with OpenAI, as it launches with `gpt-oss-20b` and `gpt-oss-120b`

sambaumann10mo ago

Yep, on the ollama home page (https://ollama.com/) it says

> OpenAI and Ollama partner to launch gpt-oss

hobofan10mo ago

I do hope Ollama got a good paycheck from that, as they are essentially help OpenAI to oss-wash their image with the goodwill that Ollama has built up.

llmtosser10mo ago· 2 in thread

Distractions like this probably the reason they still, over a year now, do not support sharded GGUF.

https://github.com/ollama/ollama/issues/5245

If any of the major inference engines - vLLM, Sglang, llama.cpp - incorporated api driven model switching, automatic model unload after idle and automatic CPU layer offloading to avoid OOM it would avoid the need for ollama.

jychang10mo ago

That’s just llama-swap and llama.cpp

llmtosser10mo ago

Interesting - it does indeed seem like llama-server has the needed endpoints to do the model swapping and llama.cpp as of recently also has a new flag for the dynamic CPU offload now.

However the approach to model swapping is not 'ollama compatible' which means all the OSS tools supporting 'ollama' Ex Openwebui, Openhands, Bolt.diy, n8n, flowise, browser-use etc.. aren't able to take advantage of this particularly useful capability as best I can tell.

buyucu10mo ago· 2 in thread

More than one year in and Ollama still doesn't support Vulkan inference. Vulkan is essential for consumer hardware. Ollama is a failed project at this point: https://news.ycombinator.com/item?id=42886680

zozbot23410mo ago

There's an open pull request https://github.com/ollama/ollama/pull/9650 but it needs to be forward ported/rebased to the current version before the maintainers can even consider merging it.

Also realistically, Vulkan Compute support mostly helps iGPU's and older/lower-end dGPU's, which can only bring a modest performance speed up in the compute-bound preprocessing phase (because modern CPU inference wins in the text-generation phase due to better memory bandwidth). There are exceptions such as modern Intel dGPU's or perhaps Macs running Asahi where Vulkan Compute can be more broadly useful, but these are also quite rare.

buyucu10mo ago

That pull request has been open for more than a year. The owner rebased multiple times but eventually gave up because Ollama devs just don't care.

1 more reply

irthomasthomas10mo ago· 2 in thread

If these are FP4 like the other ollama models then I'm not very interested. If I'm using an API anyway I'd rather use the full weights.

mchiang10mo ago

OpenAI has only provided MXFP4 weights. These are the same weights used by other cloud providers.

irthomasthomas10mo ago

Oh, I didn't know that. Weird!

1 more reply

paxys10mo ago· 1 in thread

A subscription fee for API usage is definitely an interesting offering, though the actual value will depend on usage limits (which are kept hidden).

mchiang10mo ago

we are learning the usage patterns to be able to price this more properly.

Havoc10mo ago· 1 in thread

That'll be an uphill battle on value proposition tbh. $20 a month for access to a widely available MoE 120B with ~5B active parameters at unspecified usage limits?

I guess their target audience values convenience and easy of use above all else so that could play well there maybe.

selcuka10mo ago

> Turbo includes hourly and daily limits to avoid capacity issues. Usage-based pricing will soon be available to consume models in a metered fashion.

Doesn't look that much better than a ChatGPT Plus subscription.

santa_boy10mo ago· 1 in thread

Is there an evaluation of such services available anywhere. Looking for recommendations for similar services with usage based pricing and pro-and-cons.

ps: looking for most economic one to play around with as long as it a decent enough experience (minimal learning curve). buy, happy to pay too

splittydev10mo ago

OpenRouter is great. Less privacy I guess, but you pay for usage and you have access to hundreds of models. They have free models too, albeit rate-limited.

rohansood1510mo ago· 1 in thread

The 'Sign In' link on the Ollama Mac App when you click Turbo doesn't work...

jmorgan10mo ago

It should open ollama.com/connect – sorry about that. Feel free to message me jeff@ollama.com if you keep seeing issues

orliesaurus10mo ago· 1 in thread

Does anyone know if this is like like OpenRouter?

ivape10mo ago

Often the math works out that you get a lot more for $20 a month if you settle for smaller sized but capable models (8b-30b). I don’t see how it’s better other than Ollama can “promise” they don’t store your data where as OpenRouter is dependent on which host you choose (and there’s no indicator on OpenRouter exposing which ones do or don’t).

In a universe where everything you say can be taken out of context, things like OpenAi will be a data leak nightmare.

Need this soon:

https://arxiv.org/abs/2410.02486

agnishom10mo ago· 1 in thread

> What is Turbo?

> Turbo is a new way to run open models using datacenter-grade hardware.

What? Why not just say that it is a cloud-based service for running models? Why this language?

owebmaster10mo ago

Why use meaningful words in place of allegories like clouds, you ask?

hanifbbz10mo ago· 1 in thread

I like how the landing page (and even this HN page until this point) completely miss any reference to Meta and Facebook. The landing page promises privacy but anyone who knows how FB used VPN software to spy on people, knows that as long as the current leadership is in place, we shouldn't assume they've all of a sudden became fans of our privacy.

tuckerman10mo ago

Ollama isn’t connected to Meta besides offering Llama as one of the potential models you can run.

There is obviously some connection to Llama (the original models giving rise to llama.cpp which Ollama was built on) but the companies have no affiliation.

ahmedhawas12310mo ago

So much that is interesting about this

For one of the top local open model inference engines of choice - only supporting OSS out of the gate feels like an angle to just ride the hype knowing OSS is announced today "oh OSS came out and you can use Ollama Turbo to use it"

The subscription based pricing is really interesting. Other players offer this but not for API type services. I always imagine that there will be a real pricing war with LLMs with time / as capabilities mature, and going monthly pricing on API services is possibly a symptom of that

What does this mean for the local inference engine? Does Ollama have enough resources to maintain both?

factorialboy10mo ago

In case the website isn't clear, this seems to be a paid-hosted service for models.

zacian10mo ago

Does this mean we can access Ollama APIs for $20/mo and test them without running the model locally? I'm not hardware-rich, but for some projects, I'd like a reliable pricing.

leopoldj10mo ago

For production use of open weight models I'd use something like Amazon Bedrock, Google Vertex AI (which uses vLLM), or on-prem vLLM/SGLang. But for a quick assessment of a model as a developer, Ollama Turbo looks appealing. I find Google GCP incredibly user hostile and a nightmare to navigate quotas and stuff.

domatic110mo ago

Open router competition?

aglazer10mo ago

This is super exciting. Congratulations on the launch!

radioradioradio10mo ago

Looks like Docker's "offload" product, but with less functionality and more vendor lock-in, the simple pricing both excites and worries me.

philip120910mo ago

Seems like an easy way to run gpt-oss for development environments on laptops. Probably necessary if you plan to self-host in production.

_giorgio_10mo ago

Can anyone explain why this is a bad thing?

Is it because they developed s new ollama which isn't open and which doesn't use llama.cpp?

scosman10mo ago

I build an app against the Ollama API. If this will let me test all Ollama models, I'm so in.

st3fan10mo ago

Does anyone know who or what ollama is in terms of people and company?

jp101610mo ago

at this point, can i purchase the subscription directly from the model provider or hugging face and use it? or is this ollama attempt to become a provider like them.

cchance10mo ago

20$ ... for the openai opensource models in preview only?

yahoozoo10mo ago

Daily limits yawn

ochronus10mo ago

Ah, vague "limits". Hard pass.

fud10110mo ago

No thanks, Ollama. I'd rather give the money to anyone but you grifters.

colesantiago10mo ago

No matter if a project is "open source" as long as they announce that they have raised millions amount of dollars from investors...

It is completely compromised, especially if it is an AI company.

How do you think ollama was able to provide the open source AI models to everyone for free?

I am pretty sure ollama was losing money on every pull of those images from their infrastructure.

Those that are now angry at ollama charging money or not focusing on privacy should have been angry when they raised money from investors.

j / k navigate · click thread line to collapse

243 comments

172 comments · 41 top-level

smlacy10mo ago· 23 in thread

Watching ollama pivot from a somewhat scrappy yet amazingly important and well designed open source project to a regular "for-profit company" is going to be sad.

Thankfully, this may just leave more room for other open source local inference engines.

mchiang10mo ago

we have always been building in the open, and so is Ollama. All the core pieces of Ollama are open. There are areas where we want to be opinionated on the design to build the world we want to see.

Can't we all just work together and create a better world? Or does it have to be so zero sum?

xiphias210mo ago

I wanted to try web search to increase my privacy but it wanted to do login.

If ollama can't do it, maybe a fork.

1 more reply

dcreater10mo ago

I'm sorry but your words don't match your actions.

shepardrtc10mo ago

Aeolun10mo ago

> this isn't interfering with their open source project

Wait until it makes significant amounts of money. Suddenly the priorities will be different.

I don’t begrudge them wanting to make some money off it though.

1 more reply

smeeth10mo ago

Their FOSS local inference service didn't go anywhere.

This isn't Anaconda, they didn't do a bait and switch to screw their core users. It isn't sinful for devs to try and earn a living.

kermatt10mo ago

Another perspective:

If you earn a living using something someone else built, and expect them not to earn a living, your paycheck has a limited lifetime.

“Someone” in this context could be a person, a team, or a corporate entity. Free may be temporary.

blitzar10mo ago

Yet. Their FOSS local inference service hasn't go anywhere ... yet.

dcreater10mo ago

You can build this and go build something else as well. You don't need to morph the thing you built. That's underhanded

TuringNYC10mo ago

>> Watching ollama pivot from a somewhat scrappy yet amazingly important and well designed open source project to a regular "for-profit company" is going to be sad.

if i could have consistent and seamless local-cloud dev that would be a nice win. everyone has to write things 3x over these days depending on your garden of choice, even with langchain/llamaindex

mark_l_watson10mo ago

The Ollama app using the signed-in-only web search tool is really pretty good.

satvikpendem10mo ago

> important and well designed open source project

It was always just a wrapper around the real well designed OSS, llama.cpp. Ollama even messes up the names of models by calling distilled models the name of the actual one, such as DeepSeek.

Ollama's engineers created Docker Desktop, and you can see how that turned out, so I don't have much faith in them to continue to stay open given what a rugpull Docker Desktop became.

Philpax10mo ago

I wouldn't go as far as to say that llama.cpp is "well designed" (there be demons there), but I otherwise agree with the sentiment.

user-10mo ago

I remember them pivoting from being infra.hq

dangoodmanUT10mo ago

It was always a company

mythz10mo ago

Same, was just after a small lightweight solution where I can download, manage and run local models. Really not a fan of boarding the enshittification train ride with them.

Together with their new closed-source UI [1] it's time for me to switch back to llama.cpp's cli/server.

[1] https://www.reddit.com/r/LocalLLaMA/comments/1meeyee/ollamas...

colesantiago10mo ago

ollama is YC and VC backed, this was inevitable and not surprising.

All companies that raise outside investment follow this route.

No exceptions.

And yes this is how ollama will fall due to enshittification, for lack of a better word.

otabdeveloper410mo ago

[flagged]

dang10mo ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

1 more reply

api10mo ago

> Repackaging existing software while literally adding no useful functionality was always their gig.

Developers continue to be blind to usability and UI/UX. Ollama lets you just install it, just install models, and go. The only other thing really like that is LM-Studio.

It's not surprising that the people behind it are Docker people. Yes you can do everything Docker does with Linux kernel and shell commands, but do you want to?

Making software usable is often many orders of magnitude more work than making software work.

1 more reply

llmtosser10mo ago

This is not true.

No inference engine does all of:

- Model switching

- Unload after idle

- Dynamic layer offload to CPU to avoid OOM

1 more reply

mchiang10mo ago

sorry that you feel the way you feel. :(

There are certain features we want to build into Ollama, and we want to be opinionated on the experience we want to build.

Have you supported our past gigs before? Why not be more happy and optimistic in seeing everyone build their dreams (success or not).

If you go build a project of your dreams, I'd be supportive of it too.

1 more reply

dangoodmanUT10mo ago

Yes everyone should just write cpp to call local LLMs obviously

1 more reply

dcreater10mo ago· 22 in thread

Called it.

It's very unfortunate that the local inference community has aggregated around Ollama when it's clear that's not their long term priority or strategy.

Its imperative we move away ASAP

tarruda10mo ago

Llama.cpp (library which ollama uses under the hoods) has its own server, and it is fully compatible with open-webui.

I moved away from ollama in favor of llama-server a couple of months ago and never missed anything, since I'm still using the same UI.

mchiang10mo ago

4 more replies

halJordan10mo ago

Fully compatible is a stretch, it's important we dont fall into a celebrity "my guy is perfect" trap. They implement a few endpoints.

1 more reply

benreesman10mo ago

Maybe I've got it twisted, but it seems to be that the people who actually do `ggml` aren't happy about it, and I've got their back on this.

om810mo ago

It’s unfortunate that llama.cpp’s code is a mess. It’s impossible to make any meaningful contributions to it.

1 more reply

A4ET8a8uTh0_v210mo ago

Interesting, admittedly, I am slowly getting to the point, where ollama's defaults get a little restrictive. If the setup is not too onerous, I would not mind trying. Where did you start?

1 more reply

theshrike7910mo ago

Isn't the open-webui maintainer heavily against MCP support and tool calling?

mchiang10mo ago

hmm, how so? Ollama is open and the pricing is completely optional for users who want additional GPUs.

Is it bad to fairly charge money for selling GPUs that cost us money too, and use that money to grow the core open-source project?

At one point, it just has to be reasonable. I'd like to believe by having a conscientious, we can create something great.

dcreater10mo ago

First, I must say I appreciate you taking the time to be engaged on this thread and responding to so many of us.

I dont know how to regard these other than being largely motivated out of self interest.

tomrod10mo ago

Everyone just wants to solarpunk this up.

1 more reply

sitkack10mo ago

I believe that is what https://github.com/containers/ramalama set out to do.

janalsncm10mo ago

Huggingface also offers a cloud product, but that doesn’t take away from downloading weights and running them locally.

idiotsecant10mo ago

Oh no this is a positively diabolical development, offering...hosting services tailored to a specific use case at a reasonable price ...

SV_BubbleTime10mo ago

They can’t keep getting away with this.

mrcwinn10mo ago

rpdillon10mo ago

1 more reply

Aurornis10mo ago

> Its imperative we move away ASAP

Why? If the tool works then use it. They’re not forcing you to use the cloud.

dcreater10mo ago

There are many, many FOSS apps that use Ollama as a dependency. If Ollama rugs, then all those projects suffer.

Its a tale we seen played out many times. Redis is the most recent example.

1 more reply

prettyblocks10mo ago

Local inference is becoming completely commoditized imo. These days even docker has a local models you can launch with a single click (or command).

fud10110mo ago

i was trying to remove it but noticed they've hidden the uninstall away. It amounts to doing a rm - which is a joke.

jcelerier10mo ago

happy sglang user here :)

cchance10mo ago

I stopped using them when they started doing the weird model naming bullshit stuck with lmstudio since

jacekm10mo ago· 13 in thread

What could be the benefit of paying $20 to Ollama to run inferior models instead of paying the same amount of money to e.g. OpenAI for access to sota models?

daft_pink10mo ago

I feel the primary benefit of this Ollama Turbo is that you can quickly test and run different models in the cloud that you could run locally if you had the correct hardware.

jerieljan10mo ago

> quickly test and run different models in the cloud that you could run locally if you had the correct hardware.

I feel like they're competing against Hugging Face or even Colaboratory then if this is the case.

I can see its potential once it gets rolling, since there's a lot of ollama installations out there.

fluidcruft10mo ago

Me at home: $20/mo while I wait for a card that can run this or dgx box? Decisions, decisions.

dawnerd10mo ago

Quickly test… the two models they support? This is just another subscription to quantized models.

1 more reply

rapind10mo ago

I'm not sure the major models will remain at $20. Regardless, I support any and all efforts to keep the space crowded and competitive.

adrr10mo ago

michelsedgh10mo ago

I think its the data privacy is the main point and probably more usage before you hit limits? But mainly data privacy i guess

ibejoeb10mo ago

I run a lot of mundane jobs that work fine with less capable models, so I can see the potential benefit. It all depends on the limits though.

_--__--__10mo ago

Groq seems to do okay with a similar service but I think their pricing is probably better.

woadwarrior0110mo ago

Groq's moat is speed, using their custom hardware.

Geezus_4210mo ago

Yeah, the NAZI sex not will be great for business!

2 more replies

AndroTux10mo ago

Privacy, I guess. But at this point it’s just believing that they won’t log your data.

vanillax10mo ago

nothing lmao. this is just ollama trying to make money.

jnmandal10mo ago· 10 in thread

I see a lot of hate for ollama doing this kind of thing but also they remain one of the easiest to use solutions for developing and testing against a model locally.

mchiang10mo ago

Thanks for the kind words.

Since the new multimodal engine, Ollama has moved off of llama.cpp as a wrapper. We do continue to use the GGML library, and ask hardware partners to help optimize it.

Ollama might look like a toy and what looks trivial to build. I can say, to keep its simplicity, we go through a deep amount of struggles to make it work with the experience we want.

Simplicity is often overlooked, but we want to build the world we want to see.

dcreater10mo ago

3 more replies

leopoldj10mo ago

> Ollama has moved off of llama.cpp as a wrapper. We do continue to use the GGML library

Where can I learn more about this? llama.cpp is an inference application built using the ggml library. Does this mean, Ollama now has it's own code for what llama.cpp does?

1 more reply

buyucu10mo ago

This kind of gaslighting is exactly why I stopped using Ollama.

GGML library is llama.cpp. They are one and the same.

Ollama made sense when llama.cpp was hard to use. Ollama does not have value preposition anymore.

2 more replies

steren10mo ago

> I would never want to use something like ollama in a production setting.

We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.

ekianjo10mo ago

you need to benchmark against llama.cpp as well.

apitman10mo ago

Did you test multi-user cases?

1 more reply

sbinnee10mo ago

vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine.

romperstomper10mo ago

miki12321110mo ago

> I would never want to use something like ollama in a production setting

If you can't get access to "real" datacenter GPUs for any reason and essentially do desktop, clientside deploys, it's your best bet.

It's not a common scenario, but a desktop with a 4090 or two is all you can get in some organizations.

moralestapia10mo ago· 9 in thread

Ollama is great but I feel like Georgi Gerganov deserves way more credit for llama.cpp.

He (almost) single-handedly brought LLMs to the masses.

With the latest news of some AI engineers' compensation reaching up to a billion dollars, feels a bit unfair that Georgi is not getting a much larger slice of the pie.

mrs696910mo ago

Agreed. Ollama itself is kind a wrapper around llamacpp anyway. Feel like the real guy is not included to the process.

Now I am going to go and write a wrapper around llamacpp, that is only open source, truly local.

How can I trust ollama to not to sell my data.

Patrick_Devine10mo ago

Ollama only uses llamacpp for running legacy models. gpt-oss runs entirely in the ollama engine.

You don't need to use Turbo mode; it's just there for people who don't have capable enough GPUs.

rafram10mo ago

Ollama is not a wrapper around llama.cpp anymore, at least for multimodal models (not sure about others). They have their own engine: https://ollama.com/blog/multimodal-models

1 more reply

benreesman10mo ago

freedomben10mo ago

Is Georgi landing any of those big-time money jobs? I could see a conflict-of-interest given his involvment with llama.cpp, but I would think he'd be well positioned for something like that

apwell2310mo ago

https://ggml.ai/

> ggml.ai is a company founded by Georgi Gerganov to support the development of ggml. Nat Friedman and Daniel Gross provided the pre-seed funding.

moralestapia10mo ago

(This is mere speculation)

I think he's happy doing his own thing.

But then, if someone came in with a billion ... who wouldn't give it a thought?

1 more reply

am17an10mo ago

guipsp10mo ago

No one is astroturfing. You cannot run any model with just GGML. It's a tensor library. Yes, it adds value, but I don't think that saying that ollama also does is unfair.

satellite210mo ago· 7 in thread

"All hardware is located in the United States."

If I use local/OSS models it's specifically to avoid running in a country with no data protection laws. It's a big close miss here.

bangaladore10mo ago

I think what matters more here is "All hardware is located outside of China". Located in the US means little because that's not good enough for many regulated industries even within the US.

All things considered though, Europe is getting confusing. They have GDPR but now pushing to backdoor encryption within the EU? [1]

At least there isn't a strong movement in the US trying to outlaw E2E encryption.

[1] https://www.eff.org/deeplinks/2025/06/eus-encryption-roadmap...

blitzar10mo ago

I would feel safer if the hardware was located in China than in the US.

bangaladore10mo ago

Maybe I hit a nerve with the EU part? I thought it was a fair observation, but I'm open to being corrected if there's more nuance I missed.

1 more reply

wkat424210mo ago

Even the backdoor is an American lobby. Ashton Kutcher and Demi Moore's Thorn.

impulser_10mo ago

Then don't use it and keep using models locally?

riazrizvi10mo ago

pphysch10mo ago

Any evidence for this claim that e.g. Mossad has less penetration into digital systems of USA than it does RF or PRC?

1 more reply

polarbear6710mo ago· 7 in thread

Why does everything AI-related have to be $20? Why can't there be tiers? OpenAI setting the standard of $20/m for every AI application is one of the worst things to ever happen.

paxys10mo ago

https://openai.com/chatgpt/pricing/ - $0 / $20 / $200 / $25 (team) / custom enterprise pricing / on-demand API pricing

https://www.anthropic.com/pricing - $0 / $17 (if billed annually) / $20 (if billed monthly) / $100 / $25 (team) / custom enterprise pricing / on-demand API pricing

Sounds like tiers to me.

polarbear6710mo ago

I should have specified less expensive tiers (below the $20 standard). A tier <= $10 would be great. Anything over $10 for casual use seems excessive (or at least from my perspective)

colesantiago10mo ago

Tokens are expensive and nobody is making any money.

senectus110mo ago

yep. this is the 2nd half of why the AI bubble is going to pop.

thimabi10mo ago

furyofantares10mo ago

joecot10mo ago

I strongly recommend together.ai, which allows you to use a lot of different open source models and charges for usage, not a monthly fee.

liuliu10mo ago· 6 in thread

Any more information on "Privacy first"? It seems pretty thin if just not retaining data.

pagekicker10mo ago

I see no privacy advantage to working with Ollama, which can sell your data or have it subpoenaed just like anyone else.

liuliu10mo ago

In theory, "privacy pass" should help, as you can subpoena content, but cannot know who made these. But that is still thin (and Ollama not doing that too anyway).

jmort10mo ago

I don't see a privacy policy and their desktop app is closed source. So, not encouraging.

[full disclosure I am working on something with actual privacy guarantees for LLM calls that does use a transparency log, etc.]

pbronez10mo ago

I’d love to learn more about your project. I’m using socialized cloud regions for AI security and they really lag the mainstream. Definitely need more options here.

Edit: emailed the address on the site in your profile, got an inbox does not exist error.

pogue10mo ago

seanmcdirmid10mo ago

I had to do a double take here. Switzerland surely isn’t in the GDPR, so you mean their own privacy laws or GDPR in the EU?

jasonjmcghee10mo ago· 5 in thread

Interested to see how this plays out - I feel like Ollama is synonymous with "local".

Aurornis10mo ago

There's a small but vocal minority of users who don't trust big companies, but don't mind paying small companies for a similar service.

I'm also interested to see if that small minority of people are willing to pay for a service like this.

jillesvangurp10mo ago

recursivegirth10mo ago

Ollama, run by Facebook. Small company, huh.

1 more reply

threetonesun10mo ago

theshrike7910mo ago

Yep, if you just want to play one or two games at 4k HDR etc. it's a lot cheaper to pay 22€ for GeForce Now Ultimate vs. getting a whole-ass gaming PC capable of the same.

decide100010mo ago· 5 in thread

It was fun because it was open. Now it's just another brand seeking dollars.

mchiang10mo ago

ciaranmca10mo ago

thimabi10mo ago

I’m not throwing the towel on Ollama yet. They do need dollars to operate, but still provide excellent software for running models locally and without paying them a dime.

recursivegirth10mo ago

^ this. As a developer, Ollama has been my go-to for serving offline models. I then use cloudflare tunnels to make them available where I need them.

DiabloD310mo ago

Although it is open, its really just all code borrowed from llama.cpp.

If you want to see where the actual developers do the actual hard work, go use llama.cpp instead.

extr10mo ago· 4 in thread

Nice release. Part of the problem right now with OSS models (at least for enterprise users) is the diversity of offerings in terms of:

- Speed

- Cost

- Reliability

- Feature Parity (eg: context caching)

- Performance (What quant level is being used...really?)

- Host region/data privacy guarantees

- LTS

And that's not even including the decision of what model you want to use!

coderatlarge10mo ago

true but ignores handing over all your prompt traffic without any real legal protections as sama has pointed out:

[1] https://californiarecorder.com/sam-altman-requires-ai-privil...

I_am_tiberius10mo ago

I wouldn't be surprised if those undeleted chats or some inferred data that is based on it is part of the gpt-5 training data. Somehow I don't trust this sama guy at all.

supermatt10mo ago

> OpenAI confirmed it has been preserving deleted and non permanent person chat logs since mid-Might 2025 in response to a federal court docket order

> The order, embedded under and issued on Might 13, 2025, by U.S. Justice of the Peace Decide Ona T. Wang

Is this some meme where “may” is being replaced with “might”, or some word substitution gone awry? I don’t get it.

5 more replies

wkat424210mo ago

Gpt-oss comes only in 4.5 bit quant. This is the native model, so there's no fp16 original

timmg10mo ago· 3 in thread

It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.

I hope this works out well for the team.

ac2910mo ago

> It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.

wongarsu10mo ago

A flat fee service for open-source LLMs is somewhat unique, even if I don't see myself paying for it.

Aeolun10mo ago

I mean $20/month for API access is definitely new.

captainregex10mo ago· 2 in thread

janalsncm10mo ago

captainregex10mo ago

money is great! I like money! but if this is their version of buy me a coffee I think there’s room to run elsewhere for their skillset/area of expertise

1 more reply

turnsout10mo ago· 2 in thread

Man, busy day in the world of AI announcements! This looks coordinated with OpenAI, as it launches with `gpt-oss-20b` and `gpt-oss-120b`

sambaumann10mo ago

Yep, on the ollama home page (https://ollama.com/) it says

> OpenAI and Ollama partner to launch gpt-oss

hobofan10mo ago

I do hope Ollama got a good paycheck from that, as they are essentially help OpenAI to oss-wash their image with the goodwill that Ollama has built up.

llmtosser10mo ago· 2 in thread

Distractions like this probably the reason they still, over a year now, do not support sharded GGUF.

https://github.com/ollama/ollama/issues/5245

jychang10mo ago

That’s just llama-swap and llama.cpp

llmtosser10mo ago

Interesting - it does indeed seem like llama-server has the needed endpoints to do the model swapping and llama.cpp as of recently also has a new flag for the dynamic CPU offload now.

buyucu10mo ago· 2 in thread

zozbot23410mo ago

There's an open pull request https://github.com/ollama/ollama/pull/9650 but it needs to be forward ported/rebased to the current version before the maintainers can even consider merging it.

buyucu10mo ago

That pull request has been open for more than a year. The owner rebased multiple times but eventually gave up because Ollama devs just don't care.

1 more reply

irthomasthomas10mo ago· 2 in thread

If these are FP4 like the other ollama models then I'm not very interested. If I'm using an API anyway I'd rather use the full weights.

mchiang10mo ago

OpenAI has only provided MXFP4 weights. These are the same weights used by other cloud providers.

irthomasthomas10mo ago

Oh, I didn't know that. Weird!

1 more reply

paxys10mo ago· 1 in thread

A subscription fee for API usage is definitely an interesting offering, though the actual value will depend on usage limits (which are kept hidden).

mchiang10mo ago

we are learning the usage patterns to be able to price this more properly.

Havoc10mo ago· 1 in thread

That'll be an uphill battle on value proposition tbh. $20 a month for access to a widely available MoE 120B with ~5B active parameters at unspecified usage limits?

I guess their target audience values convenience and easy of use above all else so that could play well there maybe.

selcuka10mo ago

> Turbo includes hourly and daily limits to avoid capacity issues. Usage-based pricing will soon be available to consume models in a metered fashion.

Doesn't look that much better than a ChatGPT Plus subscription.

santa_boy10mo ago· 1 in thread

Is there an evaluation of such services available anywhere. Looking for recommendations for similar services with usage based pricing and pro-and-cons.

ps: looking for most economic one to play around with as long as it a decent enough experience (minimal learning curve). buy, happy to pay too

splittydev10mo ago

OpenRouter is great. Less privacy I guess, but you pay for usage and you have access to hundreds of models. They have free models too, albeit rate-limited.

rohansood1510mo ago· 1 in thread

The 'Sign In' link on the Ollama Mac App when you click Turbo doesn't work...

jmorgan10mo ago

It should open ollama.com/connect – sorry about that. Feel free to message me jeff@ollama.com if you keep seeing issues

orliesaurus10mo ago· 1 in thread

Does anyone know if this is like like OpenRouter?

ivape10mo ago

In a universe where everything you say can be taken out of context, things like OpenAi will be a data leak nightmare.

Need this soon:

https://arxiv.org/abs/2410.02486

agnishom10mo ago· 1 in thread

> What is Turbo?

> Turbo is a new way to run open models using datacenter-grade hardware.

What? Why not just say that it is a cloud-based service for running models? Why this language?

owebmaster10mo ago

Why use meaningful words in place of allegories like clouds, you ask?

hanifbbz10mo ago· 1 in thread

tuckerman10mo ago

Ollama isn’t connected to Meta besides offering Llama as one of the potential models you can run.

There is obviously some connection to Llama (the original models giving rise to llama.cpp which Ollama was built on) but the companies have no affiliation.

ahmedhawas12310mo ago

So much that is interesting about this

What does this mean for the local inference engine? Does Ollama have enough resources to maintain both?

factorialboy10mo ago

In case the website isn't clear, this seems to be a paid-hosted service for models.

zacian10mo ago

Does this mean we can access Ollama APIs for $20/mo and test them without running the model locally? I'm not hardware-rich, but for some projects, I'd like a reliable pricing.

leopoldj10mo ago

domatic110mo ago

Open router competition?

aglazer10mo ago

This is super exciting. Congratulations on the launch!

radioradioradio10mo ago

Looks like Docker's "offload" product, but with less functionality and more vendor lock-in, the simple pricing both excites and worries me.

philip120910mo ago

Seems like an easy way to run gpt-oss for development environments on laptops. Probably necessary if you plan to self-host in production.

_giorgio_10mo ago

Can anyone explain why this is a bad thing?

Is it because they developed s new ollama which isn't open and which doesn't use llama.cpp?

scosman10mo ago

I build an app against the Ollama API. If this will let me test all Ollama models, I'm so in.

st3fan10mo ago

Does anyone know who or what ollama is in terms of people and company?

jp101610mo ago

at this point, can i purchase the subscription directly from the model provider or hugging face and use it? or is this ollama attempt to become a provider like them.

cchance10mo ago

20$ ... for the openai opensource models in preview only?

yahoozoo10mo ago

Daily limits yawn

ochronus10mo ago

Ah, vague "limits". Hard pass.

fud10110mo ago

No thanks, Ollama. I'd rather give the money to anyone but you grifters.

colesantiago10mo ago

No matter if a project is "open source" as long as they announce that they have raised millions amount of dollars from investors...

It is completely compromised, especially if it is an AI company.

How do you think ollama was able to provide the open source AI models to everyone for free?

I am pretty sure ollama was losing money on every pull of those images from their infrastructure.

Those that are now angry at ollama charging money or not focusing on privacy should have been angry when they raised money from investors.

j / k navigate · click thread line to collapse