undefined | Better HN

0 pointsSimianSci4d ago0 comments

More signal that the open-weight models should be our destiny as an industry. These proprietary models are being used to usher in more surveillance and gatekeeping across the industry.

0 comments

69 comments · 7 top-level

herodoturtle4d ago· 16 in thread

I’m curious (and please forgive my ignorance if it’s obvious), are open weight models practically feasible?

I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts.

I guess I’m trying to understand the economics of it.

SimianSciOP4d ago

There is an understandable gap between the capabilities of closed models and those of open models. The current difference is primarily expressed in the cost of hardware necessary to sufficiently run a exactly comparable model. A single higher end graphics card running on your average gaming computer, is capable of running small to medium models that compare with those of their lab-born counterparts in the small-medium range. But the heavyweight models are still outside the realm of possibility for all but the most well-funded individual.

However, I would highly suggest more people experiment with these smaller models. They are incredibly capable in many ways that many people dont realize.

The perceived capabilities of the larger models are also much less the result of the model having more parameters/training cycles, but rather that they are being run through well-made harnesses, something which the open-source community is rapidly approaching with near-peer solutions of their own.

In short, much of the gap between between open-weight models and the larger proprietary models can be considered more of an issue of perception and not an issue of capability. There is a fundamental gap economically, but not so much in capability. The open source community is rapidly closing the gap on these larger labs, especially thanks to the amazing research being freely given openly by well funded chinese labs.

herodoturtle3d ago

Thank you for taking the time to reply with a really insightful comment.

I wonder if open source / open weight models will reach the point where we can run them locally on our mobile devices (for free), even if they're slightly inferior to the proprietary pay-as-you-use online models.

I know very little about this stuff. My inner optimist kinda hopes that the tech will continue to advance and become increasingly commoditised, to the point that open source locally run models are as good as the advanced proprietary models of a year ago. So that even if the open source models lag the proprietary models, they're still pretty great. Perhaps we're already there but I wouldn't know.

Anyhow thank you for the insights :-)

anigbrowl4d ago

Sort of. A full trillion-parameter model needs about $300k of server hardware to run in and a lot of electricity, making it feasible only for very wealthy individuals, but quite practical for businesses and institutions above a certain size...although they in turn would typically gatekeep access.

You can drastically reduce the requirements by running models at a lower bitrate, which somewhat reduces accuracy but not that much - think of the difference between an MP3 vs uncompressed audio. With this and other tricks, you can get high end models down to a size where they can be run on a high spec desktop workstation affordable by an individual or small business.

Obviously I'm heavily oversimplifying here. I think a useful parallel is to consider situations from the past where you would once have required corporate budgets equivalent to the price of a house to run a large database, but over time it became accessible to anyone with the requisite expertise and relatively affordable hardware.

sosodev4d ago

You can run a trillion parameter model with decent quality for far less than $300k. A cluster of 4 AMD AI Max 395+ boards with 128GB unified memory each can be had for around $15k. That would run the 4-bit quant of a trillion param model well enough for personal use. At full use the cluster would only be consuming around 400-500W of power too. That's about the same as one high end graphics card.

That's still a lot of money, but most people don't really need a trillion parameter model. If privacy is more valuable than the frontier capabilities then they could almost certainly get by with much less.

nijave3d ago

Which model? I see a suspiciously similar post on amd.com running 2 bit Kimi quant on a four node cluster over 5Gbps Ethernet

Assuming math works here although I think there's some caveats depending on the model architecture, 1T 4 bit is 465Gi just for the weights so you wouldn't be able to fit kv cache.

It's showing about 8-9 tk/sec which seems quite slow for something like a web search with result aggregate although maybe bareable for smaller context stuff

The thing I've been running into with z.ai hosted GLM-5.2 is the 2024 knowledge cutoff. Anything recent requires web augmentation which is more token intensive so low tk/sec hurts even more than a "smarter" model

It seems (somewhat unsurprisingly) open weight models have older knowledge cutoffs.

1 more reply

anigbrowl3d ago

I literally wrote about running quantized models and how much more affordable it could be in the very next sentence. Please don't reply if you can't be bothered to read the entire comment, it's not that long.

1 more reply

roadside_picnic4d ago

See my comment to parent. I've been using local LLMs for practical, personal tasks for a few months now very successfuly.

You can run fantastic local models if you have either:

- M-series Apple device with ideally >= 24GB of VRAM

- RTX [345]090 GPU

I'm fortunate enough to have both and use an M-series laptop as basically a persistent server (I don't use it much and when traveling typically just use my work laptop). My desktop doesn't act as a persitent server but I fire up llama.cpp on it all time for quick chat sessions.

If you have one of the above devices and can dedicate it as server there are additional layers of tooling you can use that dramatically improve the experience. In particular Open WebUI allows you to add tons of useful tools (image gen, web search, code eval, etc), and agent harnesses like Hermes can make the current gen small models very capable. I have an agent in chat on my phone that basically handles all the sys-admin for the server it runs on.

hn_acc14d ago

What about RTX 3080? Too little VRAM?

roadside_picnic4d ago

In addition to models getting better, the quantization methods have also got much better. If you already have an RTX 3080 it's absolutely worth the time to just mess around and see how it does, experiment with different quants that fit in your VRAM. If you're purchasing I would recommend coughing up the extra cash for the 3090.

If you are experimenting it's worth mentioning that the harness/tooling is very important to getting a solid experience. Herme's agent is great for running helpful agents and OpenWeb UI can get really make the experience feel on par with paid chat interfaced.

A reasonable halfway step is to pay for an open model through the provider or open router. You'll get many of the benefits (especially around pricing) without needing to shell out on hardware before deciding if you like the way these models work.

upboundspiral2d ago

you can run Qwen 3.6 35B-A3B (3 billion active parameters + some GB for context) that can easily fit into 10 GB of ram while the not currently active experts are offloaded to the cpu ram with llama-cpp.

KronisLV4d ago

> I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts.

Presently they trail SOTA by about 6-12 months, not on par (average across everything they do).

DeepSeek V4 Pro with Max reasoning is very affordable even if you pay per-token, this month I pushed about 486 million tokens through it (I will admit that >95% was cache hits, for agentic development pretty typical) and it cost me about 8 USD in total. Meanwhile with Opus or even Sonnet if I had to pay API prices, I would be a more sad camper. That model makes a lot of stupid things though, so not ideal.

Meanwhile GLM-5.2 that came out is also quote capable and is near Opus in many tasks, all while their coding plan is more cost effective than Anthropic's: https://z.ai/subscribe

I will still stick with Anthropic but consider downgrading from Max 5x to Pro which will change the monthly expenses from around 108 EUR down to <20 EUR (they have a discount too if you pay for a year up front), and probably get the yearly GLM Pro plan which should decrease my yearly expenses from around 1300 EUR total to about 750 total EUR while still giving me a fairly decent setup.

For the consumer, that is doable and practical.

For the people actually running these models, who knows - at least DeepSeek and others are trying to make the models more efficient so the numbers are more feasible.

Also have run Qwen3.6 35B A3B on prem and it kinda sucks. Way better than models that size a year ago, but still lags behind Sonnet and also DeepSeek V4 Flash due to the size limits. Plus to even run myself I'd need a pretty beefy setup, most likely a pair of Intel Arc Pro B70s with 32 GB of VRAM each that I could still run off of my PSU but the actual model output would be kinda bullshit and I'd have to spend an unpleasant amount of time fixing it.

hatthew4d ago

I'm also curious, specifically about the cost of training vs inference, and comparing that to other industries that can have high R&D costs. My instinct says that open weights aren't feasible because of the obvious issue where there is no incentive to develop your own model rather than just taking someone else's model. However, I could see a scenario where a hardware company designs a model that is open weights but optimized strongly for their own proprietary hardware, cutting their costs of inference low enough to be competitive with a hypothetical other company that doesn't have any R&D expediture.

waffletower4d ago

If attractive, cloud providers could develop open models with their own investment, and sell hosted access as a business model. While Google checks these boxes, I haven't seen a Google much marketing focus upon their open models (Gemma) coupled with hosting. groq could conceivably train its own models, but groq's business model hosts open models (GPT OSS, Qwen 3, Llama 4 are currently their prominently advertised models on their site... which seems out of date to me) trained by other organizations.

andrewstuart24d ago

I hope/wonder if it will go the way computers did. We may learn to more effectively build RAM or parallel compute, and use it more effectively, in the coming decade in such a way that we can democratize more and more like we did with processors to the point that they're ubiquitous.

sosodev4d ago

It depends entirely on what you want to do and think is feasible. Small models can almost certainly run on the computer that you already have. They can do good tool calling.

epolanski4d ago

Yes they are you can use Qwen, DS4 Pro and GLM 5.2 if you have the hardware to do so.

They are not SOTA in various ways but they have better economics.

roadside_picnic4d ago· 15 in thread

I have a home server that runs Qwen3.6-35B-A3B through llama.cpp with Open WebUI for the user facing interface.

My teen isn't super interested in AI, but whenever they do feel curious they have their own account they can use on our home network. As far as chatting goes local models are more than capable for handling standard chat questions, doing research, helping troubleshoot problems etc. In fact it was an agent powered by the same model that setup the open webui server and took care of all the account management features through my phone (using Hermes agent).

If you're building AI powered features and using sophisticated agent setups for coding for work, then it make sense to use SoTA from these providers. But I've been using local models increasingly for personal use and am starting to find them preferable (I run an uncensored, ephemeral model for my own use and it's an entirely different experience than anything you can pay for).

Still haven't cancelled my personal Anthropic subscription, but considering it soon.

rvnx4d ago

From a privacy perspective, your objective is to stay away from people who have interest to snoop on your conversations.

So from the perspective of your teen, they would benefit from using z.ai or ChatGPT or Claude, etc, rather than the local server where you can see all the conversations.

What uncensored model do you recommend using ?

panny4d ago

>From a privacy perspective, your objective is to stay away from people who have interest to snoop on your conversations.

>So from the perspective of your teen, they would benefit from using z.ai or ChatGPT or Claude, etc, rather than the local server where you can see all the conversations.

That is bonkers. If I were a parent, I would hope my child would trust me more than systems monitored by FBI/NSA/etc. Like, what sort of sick relationship do you have to have with your own family to trust them less than strangers who would sell you into prison slavery for a buck.

rvnx4d ago

Private conversations of a teen have low value for FBI/NSA. They have infinite value to their parents.

The state isn't going to ground them, shame them at dinner, out them, or pull them out of a relationship, punish them.

Parents reading your browsing history and private conversations when you are 14-18 years old (the age of teenagers) is very very creepy, unless there is a specific danger to avoid. It's like if you read their private journal.

Adolescents need a private inner world to form an identity, and heavy parental intrusion ("psychological control") is the real distrust. Trust them, they are people, not possessions.

You can guide them, but do not store their private messages locally under your control using the excuse of protecting them from NSA.

If they trust you, they will tend to tell you upfront the things they have questions about, there is really no need to spy on their thoughts.

Same with husband/wife btw.

jrochkind14d ago

What about local models do you find preferable?

I guess "starting to find them preferable" suggests to me you think they work better, but this is surprising to me so I think I may have misunderstood, so I ask!

Like you're saying they work better than the proprietary models (in what ways?), or you find them mostly good enough and prefer the privacy or cost, or what?

roadside_picnic4d ago

There are a couple of things, but basically it boils down to the same reason people prefer Linux to Windows/MacOs: customization, control and privacy (arguably all of these are really subsets of 'control').

Having full control over how your data is retained, what the system prompt is, which version of the model you're running, etc leads to much a more consistent experience. For example, for chat sessions, I can't stand the new "let me push back" version of Claude. For my home models I never have to worry about that.

There's never a mystery as to whether the model secretly degraded performance, I always know exactly which model I'm using and how well it's utilizing resources etc. Open models also give you full visibility into the reasoning steps, so you never have to guess what the model is thinking.

Then when you start getting into things like uncensored/abliterated models we're talking about something you can't even pay for. In case you're unfamiliar, even open local models have guardrails built in. But people in the community have found ways to remove these. One of the things I've found most concerning about AI, which is under discussed, is the combination of people having personal chats with an agent that both monitors the conversation and refuses to discuss certain topics. This leads to a very deep level of self-censoring I find dystopian.

I also have multiple hermes agents setup, some with local backends other with open but non-local backends (e.g. Kimi through the API). For some tasks, I've just started to find the local agent tends to work better for the type of tasks I want (maybe it just over thinks less?). I don't use it for coding so much as research tasks and sysadmin stuff, but I've been really happy with the results.

Oh, and let's not forget, especially running on a Mac, these local models are basically free to run.

jauntywundrkind3d ago

The local models are willing to share their thinking. The Big AI models don't share their thinking, leaving only vague summaries. Having an AI that deliberately cloaks it's reasoning, that goes out of it's way to act like a Searls Chinese Room Experiment, that deliberately conceals information is incredibly gross.

I love what I get from Opus or GPT, but mainly I use GLM and it's so starkly apparent how much better it is that it let's me work together with it, that I can nudge it as it works by correcting bad assumptions or clarifying for it, as it works. And... it just doesn't feel icky. It's not a quasi-mystical alien intelligence, which, honestly, gives me strong "this should be destroyed, is unsafe, and feels outright impermissible" vibes. As a coder, seeing thinking saves time and prevents errors. As a civilization, seeing thinking let's people understand what the AI is working with and grounds society in an appreciation for what is happening, keeps us a little moored. Personally, if I were a government, I would not allow it.

Recent submission on this, The text in Claude Code’s “Extended Thinking” output is not authentic. https://patrickmccanna.net/the-text-in-claude-codes-extended... https://news.ycombinator.com/item?id=48630535

drusepth4d ago

What is an "ephemeral" model in this context?

roadside_picnic4d ago

Just running it through `llama-cli` so that there's absolutely no persistent state related to the chat (and least I believe this to be the case).

agumonkey4d ago

What kind of machine is it running on ?

bakies3d ago

I just started using this model on my Framework Desktop and it's very smart and fast.

agumonkey3d ago

that's still affordable, interesting

1 more reply

rib3ye4d ago

How many tokens /sec?

roadside_picnic4d ago

M3-Max laptop: ~55 token/sec

RTX 4090: ~190 token/sec

I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible.

The RTX, unsurprisingly, is all around superior performance wise, but: I use that computer for gaming and image gen work so I can't dedicate it as a server, and, especially when it's warmer, the heat generated under heavy loads is noticable.

ai_fry_ur_brain4d ago

> I run an uncensored, ephemeral model for my own use and it's an entirely different experience than anything you can pay for.

Dont. Goon. To. LLMs

fyltr4d ago

Wasn't the parent post referring to 'legitimate' demands? I often use them to get a broad overview of a technical field before reading human stuff on it, and it might be me but those clankers tend to spend half their reasoning on whether they are allowed to reply to my request. Censorship is an annoying waste of capacity for certain use cases, although it certainly has its boons when shipping commercial models.

1 more reply

mrits4d ago· 13 in thread

Either way I don't think this will end well for humanity.

scottyah4d ago

How could it not? I get the whole fear of AI making robots and going anti-human, but after using the tech for a few months now that seems too absurd.

thewebguyd4d ago

Because (collective) we don't own the tech. Frontier models are proprietary, their reasoning logic is hidden, and as seen with Fable the government giveth and taketh away on a whim.

Capabilities can be gated behind certification programs, or by money, or any other numerous corrupt and non-corrupt means. Model capabilities can be segregated by pricing tiers, creating an economic underclass that cannot afford access to frontier intelligence.

For humanity to benefit, the tech needs to be open and equally available to all.

jrockway4d ago

I agree with this. Computing as a field is the way it is because there is a low barrier to entry. My dad gave me a Tandy 1000 and some programming books, and now I have a very lucrative career. I never took any classes. I never had to beg anyone for permission. I could just get started making things with the minimal investment of a cheap personal computer. (And eventually, an Internet connection. Working with other people is fun!)

In a world where everyone is a Claude controller (something I honestly enjoy!), that goes away. I use hundreds of dollars of tokens a month. Suddenly, the kid in her basement with an unloved computer can't get in on the ground floor. You have to be rich to even get started. That worries me deeply. It's a big change for our field, and I don't think it's a good one.

1 more reply

axus4d ago

AI isn't the problem, concentration of power is the problem. I think we agree!

1 more reply

scottyah4d ago

Do you hate all lessons from humanity's past or just the most important ones? If it takes work from a specific subset of the population and isn't compensated, then my friend, what you advocate for is slavery...

1 more reply

petre4d ago

It's would probably just burn more gas and make the climate even worse. Some assholes will get richer in the process.

scottyah4d ago

But "some assholes" is an extremely large, growing group of people. Do you have any idea how much more productive small business owners are now? It's an insane boost for people who didn't want to spend their time on things that are extremely critical for business but not the focus of the business.

1 more reply

munk-a4d ago

There are two rationale objections, I think...

One is the potential for skill rot where AI grows a heavy dependence in new employees and once the real price per token cost is settled on and discoverable (post massive IPOs and probably a while post - not immediately after) we, as a society, are left with a bunch of people dependent on a deeply inefficient technology to maintain software we now view as vital that might severely impede our ability to actually deal with climate change (press X to doubt Bezos).

The second is that the psychological damage of interacting with models in a social context during your formative years is deeply damaging and we've essentially destroyed the ability for a generation or two to actually interact as productive members of society.

Addressing the second issue doesn't necessarily exclude our ability to leverage models for business productivity but it seems unlikely to happen in the current climate without that also happening. I am hesitant to believe in a sudden outbreak of common sense at this point. The first point, could really be a systems collapse trigger - we can argue about the likelihood but denying it as a possibility is excessively naive.

scottyah4d ago

Both seem to just point at the WALL-E outcome, summarized as humans outsourcing too much thinking. I just don't see that as an end- just another divide between people. I'm seeing some degradation for sure, but not really an "end".

pc864d ago

What climate change have to do with anything?

1 more reply

sevenzero4d ago

I agree with the skill drain argument but also think its a little too dramatic. Most people still can do the shit claude does for them, it just takes them 10x as long.

hn_acc14d ago

How can it end well, when it's mostly owned / controlled by narcistic billionaires who would love to eradicate anyone who so much as looks at them sideways? And who view "mass population reduction" and "I'll get to be a king in my castle, served by peons who depend on my favor to live" as the most desirable outcome of AGI?!?

If even one of these had pledge that all profit goes to end world hunger, cancer research, etc, I could possibly see it - but they haven't. They're all after finding a way to be the biggest, richest asshole possible with the ability to crush anyone in their way..

scottyah4d ago

Have you isolated yourself completely from reality? I don't even know where to begin on this. Let's start with the fact that China is pumping out some near-frontier models and open sourcing the weights- and they don't even follow capitalism and the owners aren't billionaires. Really there are like four models in the USA that are "owners/controllers", and only one is even slightly controllable by its CEO, though none of the frontier models can last a week without the support of entire teams.

Why on earth would you want to siphon off the proceeds of AI development to (ok my bias is strong here- mostly corrupt) "ideals" like world hunger and cancer research (that probably get more dollars annually than the sum of actual profit any of these companies will ever get). That would just instantly kill the ability to improve AI at all, and the world could possibly be better for a few months?

CobrastanJorji4d ago· 8 in thread

Someone should start a nonprofit company focused on developing Open AI. I bet we could even get some sensible billionaires to help the effort.

jauntywundrkind3d ago

Allen Institute for Artificial Intelligence (ai2) is doing really good open source work in the west. It's awful that the west has so few other pokers in the fire here for nonprofit AI. https://allenai.org/

jaredsohn4d ago

Maybe one of those trillionaires could help for a bit before leaving to make his own AI model, too.

codedokode4d ago

And we are definitely not going to put our users on a watch list DB and send their data to the government?

And how do we prevent Chinese companies from training on our open AI models and offering their models for free?

jrockway4d ago

How does Red Hat prevent Chinese companies from producing a Linux distribution for free? They don't. And yet they still exist.

rvnx4d ago

They can't prevent the innovation, competition and engineering, but their lobbying makes sure that the Chinese competition doesn't enter the market, and if it does, with severe obstacles on the way.

https://www.ibm.com/policy/contributions-and-expenditures

Their biggest customer is the US federal government, taken in aggregate across agencies, IBM is one of the largest federal IT contractors, and deep public-sector and financial-services contracts in the US make it IBM's single largest national market. No individual commercial company comes close to the government's aggregate spend.

Now, equivalent product, another company, they want to sell to the government twice cheaper, can they ? nope, it will be IBM winning.

Furthermore, according to the lobbyists, China = evil but they forget that a lot of software contains Chinese code.

biraj-rocks4d ago

i’d really love to be wrong, i don't think that the economics of it would let it happen.

the potential of wealth creation with AI is so high, and also the fact that research, pre-training and inference is so expensive that, that any open-AI would eventually become OpenAI.

bckr4d ago

We could all chip in

janalsncm4d ago

Based on recent SEC filings, you’ll soon be able to.

ai-x4d ago· 6 in thread

I'm happy to give my identity to Anthropic and crush my competition with irrational fear about privacy and personal data. This is a serious competitive advantage and a moat.

chinathrow4d ago

Is this satire? I really can't tell.

card_zero4d ago

Bragging about a strategy isn't very strategic. So the comment's purpose is something else.

ai-x3d ago

Warren Buffett brags about his strategy. Jeff Bezos brags about his strategy. The reason they can brag is, even if it's simple, competition doesn't have the culture to copy/follow it. (My post is literally downvoted)

Losing privacy has ZERO downsides for ordinary people. Nobody cares about your data. Literally, put all your life on a YouTube channel and see how many views that Video will get. ZERO.

Irrational fears (especially if it's conspiratorial) => Sub-optimal decision.

Just like Buffett, Bezos, my strategy is simple -- go against firms that are making irrational decision. It's the same framework to adopt cloud, AI and many frontier technologies and disrupt

sevenzero4d ago

>This is a serious competitive advantage

Given they have laughable uptime and I have yet to find a useful project mostly written by claude... I doubt it.

johndhi4d ago

Huh? Limited uptime means you can't write projects with it? I assume downtime means you can't host on it ...

sevenzero4d ago

I wont buy expensive hardware to self host a model thats outdated within 2 months. Also, yeah, uptime is important if you dont self host.

extr4d ago· 4 in thread

They are not going to let open weights models with zero restrictions exist dude. They will be regulated like guns, or probably closer to nerve gas or enriched uranium.

infamouscow4d ago

The government is not going to enforce this, the game theory does not work in their favor.

The SCOTUS has made it exceptionally clear mathematics and software are protected by the First Amendment. The Atomic Energy Act of 1954 tries to make a very narrow exception for nuclear weapons, but

1. The law has never been challenged in court for being unconstitutional, and

2. It doesn't apply to model weights

Any attempt by the government to suppress open models will meet legal challenges on the grounds of (1) or (2).

Congress could amend the act to include model weights, but that won't prevent legal challenges on the grounds of it being unconstitutional (which it is).

extr3d ago

I'm skeptical any of that matters at all if at some point AI is perceived by the government to be a true existential risk to public welfare.

pc864d ago

Only if you let them.

extr4d ago

I don't know that I want to stop such a thing. It's good that nerve gas is banned. I don't want random people having access to easy-to-follow instructions to make COVID-29.

baq4d ago

More signal this won’t happen without some serious social unrest, not garden variety Jan 6 events… and the window is closing rapidly - when this tech gets sufficiently advanced there won’t be a place to hide.

1 more reply

j / k navigate · click thread line to collapse

0 comments

69 comments · 7 top-level

herodoturtle4d ago· 16 in thread

I’m curious (and please forgive my ignorance if it’s obvious), are open weight models practically feasible?

I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts.

I guess I’m trying to understand the economics of it.

SimianSciOP4d ago

However, I would highly suggest more people experiment with these smaller models. They are incredibly capable in many ways that many people dont realize.

herodoturtle3d ago

Thank you for taking the time to reply with a really insightful comment.

Anyhow thank you for the insights :-)

anigbrowl4d ago

sosodev4d ago

nijave3d ago

Which model? I see a suspiciously similar post on amd.com running 2 bit Kimi quant on a four node cluster over 5Gbps Ethernet

Assuming math works here although I think there's some caveats depending on the model architecture, 1T 4 bit is 465Gi just for the weights so you wouldn't be able to fit kv cache.

It's showing about 8-9 tk/sec which seems quite slow for something like a web search with result aggregate although maybe bareable for smaller context stuff

It seems (somewhat unsurprisingly) open weight models have older knowledge cutoffs.

1 more reply

anigbrowl3d ago

1 more reply

roadside_picnic4d ago

See my comment to parent. I've been using local LLMs for practical, personal tasks for a few months now very successfuly.

You can run fantastic local models if you have either:

- M-series Apple device with ideally >= 24GB of VRAM

- RTX [345]090 GPU

hn_acc14d ago

What about RTX 3080? Too little VRAM?

roadside_picnic4d ago

upboundspiral2d ago

KronisLV4d ago

> I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts.

Presently they trail SOTA by about 6-12 months, not on par (average across everything they do).

Meanwhile GLM-5.2 that came out is also quote capable and is near Opus in many tasks, all while their coding plan is more cost effective than Anthropic's: https://z.ai/subscribe

For the consumer, that is doable and practical.

For the people actually running these models, who knows - at least DeepSeek and others are trying to make the models more efficient so the numbers are more feasible.

hatthew4d ago

waffletower4d ago

andrewstuart24d ago

sosodev4d ago

It depends entirely on what you want to do and think is feasible. Small models can almost certainly run on the computer that you already have. They can do good tool calling.

epolanski4d ago

Yes they are you can use Qwen, DS4 Pro and GLM 5.2 if you have the hardware to do so.

They are not SOTA in various ways but they have better economics.

roadside_picnic4d ago· 15 in thread

I have a home server that runs Qwen3.6-35B-A3B through llama.cpp with Open WebUI for the user facing interface.

Still haven't cancelled my personal Anthropic subscription, but considering it soon.

rvnx4d ago

From a privacy perspective, your objective is to stay away from people who have interest to snoop on your conversations.

So from the perspective of your teen, they would benefit from using z.ai or ChatGPT or Claude, etc, rather than the local server where you can see all the conversations.

What uncensored model do you recommend using ?

panny4d ago

>From a privacy perspective, your objective is to stay away from people who have interest to snoop on your conversations.

>So from the perspective of your teen, they would benefit from using z.ai or ChatGPT or Claude, etc, rather than the local server where you can see all the conversations.

rvnx4d ago

Private conversations of a teen have low value for FBI/NSA. They have infinite value to their parents.

The state isn't going to ground them, shame them at dinner, out them, or pull them out of a relationship, punish them.

Adolescents need a private inner world to form an identity, and heavy parental intrusion ("psychological control") is the real distrust. Trust them, they are people, not possessions.

You can guide them, but do not store their private messages locally under your control using the excuse of protecting them from NSA.

If they trust you, they will tend to tell you upfront the things they have questions about, there is really no need to spy on their thoughts.

Same with husband/wife btw.

jrochkind14d ago

What about local models do you find preferable?

I guess "starting to find them preferable" suggests to me you think they work better, but this is surprising to me so I think I may have misunderstood, so I ask!

Like you're saying they work better than the proprietary models (in what ways?), or you find them mostly good enough and prefer the privacy or cost, or what?

roadside_picnic4d ago

Oh, and let's not forget, especially running on a Mac, these local models are basically free to run.

jauntywundrkind3d ago

drusepth4d ago

What is an "ephemeral" model in this context?

roadside_picnic4d ago

Just running it through `llama-cli` so that there's absolutely no persistent state related to the chat (and least I believe this to be the case).

agumonkey4d ago

What kind of machine is it running on ?

bakies3d ago

I just started using this model on my Framework Desktop and it's very smart and fast.

agumonkey3d ago

that's still affordable, interesting

1 more reply

rib3ye4d ago

How many tokens /sec?

roadside_picnic4d ago

M3-Max laptop: ~55 token/sec

RTX 4090: ~190 token/sec

I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible.

ai_fry_ur_brain4d ago

> I run an uncensored, ephemeral model for my own use and it's an entirely different experience than anything you can pay for.

Dont. Goon. To. LLMs

fyltr4d ago

1 more reply

mrits4d ago· 13 in thread

Either way I don't think this will end well for humanity.

scottyah4d ago

How could it not? I get the whole fear of AI making robots and going anti-human, but after using the tech for a few months now that seems too absurd.

thewebguyd4d ago

Because (collective) we don't own the tech. Frontier models are proprietary, their reasoning logic is hidden, and as seen with Fable the government giveth and taketh away on a whim.

For humanity to benefit, the tech needs to be open and equally available to all.

jrockway4d ago

1 more reply

axus4d ago

AI isn't the problem, concentration of power is the problem. I think we agree!

1 more reply

scottyah4d ago

1 more reply

petre4d ago

It's would probably just burn more gas and make the climate even worse. Some assholes will get richer in the process.

scottyah4d ago

1 more reply

munk-a4d ago

There are two rationale objections, I think...

scottyah4d ago

pc864d ago

What climate change have to do with anything?

1 more reply

sevenzero4d ago

I agree with the skill drain argument but also think its a little too dramatic. Most people still can do the shit claude does for them, it just takes them 10x as long.

hn_acc14d ago

scottyah4d ago

CobrastanJorji4d ago· 8 in thread

Someone should start a nonprofit company focused on developing Open AI. I bet we could even get some sensible billionaires to help the effort.

jauntywundrkind3d ago

jaredsohn4d ago

Maybe one of those trillionaires could help for a bit before leaving to make his own AI model, too.

codedokode4d ago

And we are definitely not going to put our users on a watch list DB and send their data to the government?

And how do we prevent Chinese companies from training on our open AI models and offering their models for free?

jrockway4d ago

How does Red Hat prevent Chinese companies from producing a Linux distribution for free? They don't. And yet they still exist.

rvnx4d ago

They can't prevent the innovation, competition and engineering, but their lobbying makes sure that the Chinese competition doesn't enter the market, and if it does, with severe obstacles on the way.

https://www.ibm.com/policy/contributions-and-expenditures

Now, equivalent product, another company, they want to sell to the government twice cheaper, can they ? nope, it will be IBM winning.

Furthermore, according to the lobbyists, China = evil but they forget that a lot of software contains Chinese code.

biraj-rocks4d ago

i’d really love to be wrong, i don't think that the economics of it would let it happen.

the potential of wealth creation with AI is so high, and also the fact that research, pre-training and inference is so expensive that, that any open-AI would eventually become OpenAI.

bckr4d ago

We could all chip in

janalsncm4d ago

Based on recent SEC filings, you’ll soon be able to.

ai-x4d ago· 6 in thread

I'm happy to give my identity to Anthropic and crush my competition with irrational fear about privacy and personal data. This is a serious competitive advantage and a moat.

chinathrow4d ago

Is this satire? I really can't tell.

card_zero4d ago

Bragging about a strategy isn't very strategic. So the comment's purpose is something else.

ai-x3d ago

Losing privacy has ZERO downsides for ordinary people. Nobody cares about your data. Literally, put all your life on a YouTube channel and see how many views that Video will get. ZERO.

Irrational fears (especially if it's conspiratorial) => Sub-optimal decision.

Just like Buffett, Bezos, my strategy is simple -- go against firms that are making irrational decision. It's the same framework to adopt cloud, AI and many frontier technologies and disrupt

sevenzero4d ago

>This is a serious competitive advantage

Given they have laughable uptime and I have yet to find a useful project mostly written by claude... I doubt it.

johndhi4d ago

Huh? Limited uptime means you can't write projects with it? I assume downtime means you can't host on it ...

sevenzero4d ago

I wont buy expensive hardware to self host a model thats outdated within 2 months. Also, yeah, uptime is important if you dont self host.

extr4d ago· 4 in thread

They are not going to let open weights models with zero restrictions exist dude. They will be regulated like guns, or probably closer to nerve gas or enriched uranium.

infamouscow4d ago

The government is not going to enforce this, the game theory does not work in their favor.

1. The law has never been challenged in court for being unconstitutional, and

2. It doesn't apply to model weights

Any attempt by the government to suppress open models will meet legal challenges on the grounds of (1) or (2).

Congress could amend the act to include model weights, but that won't prevent legal challenges on the grounds of it being unconstitutional (which it is).

extr3d ago

I'm skeptical any of that matters at all if at some point AI is perceived by the government to be a true existential risk to public welfare.

pc864d ago

Only if you let them.

extr4d ago

I don't know that I want to stop such a thing. It's good that nerve gas is banned. I don't want random people having access to easy-to-follow instructions to make COVID-29.

baq4d ago

1 more reply

j / k navigate · click thread line to collapse