A laptop with an iGPU and loads of system RAM has the advantage of being able to use system ram in addition to VRAM to load models (assuming your gpu driver supports it, which most do afaik), so load up as much system RAM as you can. The downside is, the system RAM is less fast than dedicated GDDR5. These GPUs would be Radeon 890M and Intel Arc (previous generations are still decently good, if that's more affordable for you).
A laptop with a discrete GPU will not be able to load models as large directly to GPU, but with layer offloading and a quantized MoE model, you can still get quite fast performance with modern low-to-medium-sized models.
Do not get less than 32GB RAM for any machine, and max out the iGPU machine's RAM. Also try to get a bigass NVMe drive as you will likely be downloading a lot of big models, and should be using a VM with Docker containers, so all that adds up to steal away quite a bit of drive space.
Final thought: before you spend thousands on a machine, consider that there are at least a dozen companies that provide non-Anthropic/non-OpenAI models in the cloud, many of which are dirt cheap because of how fast and good open weights are now. Do the math before you purchase a machine; unless you are doing 24/7/365 inference, the cloud is fastly more cost effective.
Oh yeah, seems obvious now you said it, but this is a great point.
I'm constantly thinking "I need to get into local models but I dread spending all that time and money without having any idea if the end result would be useful".
But obviously the answer is to start playing with open models in the cloud!
Do you have some links?
Also I assume the privacy implications are vastly different compared to running locally?
My performance when using an RTX 5070 12GiB VRAM, Ryzen 7 9700X 8 cores CPU, 32GiB DDR5 6000MT (2 sticks):
- "qwen2.5:7b": ~128 tokens/second (this model fits 100% in the VRAM).
- "qwen2.5:32b": ~4.6 tokens/second.
- "qwen3:30b-a3b": ~42 tokens/second (this is a MoE model with multiple specialized "brains") (this uses all 12GiB VRAM + 9GiB system RAM, but the GPU usage during tests is only ~25%).
- qwen3.5:35b-a3b: ~17 tokens/second, but it's highly unstable and crashes -> currently not usable for me.
So currently my sweet spot is "qwen3:30b-a3b" - even if the model doesn't completely fit on the GPU it's still fast enough. "qwen3.5" was disappointing so far, but maybe things will change in the future (maybe Ollama needs some special optimizations for the 3.5-series?).I would therefore deduce that the most important thing is the amount of VRAM and that performance would be similar even when using an older GPU (e.g. an RTX 3060 with as well 12GiB RAM)?
Performance without a GPU, tested by using a Ryzen 9 5950X 16 cores CPU, 128GiB DDR4 3200 MT:
- "qwen2.5:7b": ~9 tokens/second
- "qwen3:32b": ~2 tokens/second
- "qwen3:30b-a3b": ~16 tokens/secondPower consumption? Don't ask. A subscription is cheaper.
Cold boot times are around 5m but if your usage periods are predictable it can work out ok. Works out at $2 an hour.
Still far more expensive than a ChatGPT sub.
I'd even come from another angle.. What are my options if I want a decent coding agent, on the level of what Claude does at any given price? Let's say few tens of thousands of dollars? I've had a limited look at what's available to be run locally and nothing is on par.
But right now, a Mac is the easiest way because of their memory architecture.
That said, last time I tried local LLMs (around when gpt-oss came out) it still seemed super gimmicky (or at least niche, I could imagine privacy concerns would be a big deal for some). Very few use cases where you want an LLM but can't benefit immensely from using SOTA models like Claude Opus.
As much as I love owning my stack, you'd have to use so much of this to break even vs an inference provider/aggregator with open frontier-ish models. (and personally, I want to use as little as possible)
Also, because Apple in their infinite wisdom despite giving you a fan, very lazily turn it on (I swear it has to hit 100c before it comes on) and they give you zero control over fan settings, you may want to snag something like TG Pro for the Mac. I wound up buying a license for it, this lets you define at which temperature you want to run your fans and even gives you manual control.
On my 24G RAM Macbook Pro I have about 16GB of Inference. I use Zed with LM Studio as the back-end. I primarily just use Claude Code, but as you note, I'm sure if I used a beefier Mac with more RAM I could probably handle way more.
There's a few models that are interesting on the Mac with LM Studio that let you call tooling, so it can read your local files and write and such:
mistralai/mistralai-3-3b this one's 4.49GB - So I can increase my context window for it, not sure if it auto-compacts or not, have only just started testing it
zai-org/glm-4.6v-flash - This one is 7.09GB, same thing, only just started testing it.
mistralai/mistral-3-14b-reasoning - This one is 15.2GB just shy of the max, so not a TON of wiggle room, but usable.
If you're Apple or a company that builds things for Macs or other devices, please build something to help with airflow / cooling for the MBP / Mac Mini, it feels ridiculous that it becomes a 100c device I'm not so sure its great for device health if you want to use inference for longer than the norm.
I will probably buy a new Mac whenever the inference speeds increase at a dramatic enough rate. I sure hope Apple is considering serious options for increasing inference speed.
I have a base model M4 Mac Mini and it absolutely does have a fan inside it.
I've wanted to try some of the more recent 8B models for local tab completion or agentic, any experience with those kinds of smaller models?
So far I'm using it conversationally, and scripting with tools. I wrote a simple chat interface / REPL in the terminal. But it's not integrated with code editor, nor agentic/claw-like loops. Last time I tried an open-source Codex-like thing, a popular one but I forget its name, it was slow and not that useful for my coding style.
It took some practice but I've been able to get good use out of it, for learning languages (human and programming), translation, producing code examples and snippets, and sometimes bouncing ideas like a rubber-duck method.
untested:
I haven't tried pure text models, but 27B sounds painful for my system.
However models respond very differently, and there are tricks you can do like limiting quantization of certain layers. Some models can genrally behave fine down into sub-Q4 territory, while others don't do well below Q8 at all. And then you have the way it was quantized on top of that.
So either find some actual benchmarks, which can be rare, or you just have to try.
As an example, Unsloth recently released some benchmarks[1] which showed Qwen3.5 35B tolerating quantization very well, except for a few layers which was very sensitive.
edit: Unsloth has a page detailing their updated quantization method here[2], which was just submitted[3].
[1]: https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks
[2]: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
you can always try evals and see if you have a q6 or q4 that can perform better than your q8. for smaller models i go q8. for bigger ones when i run out of memory I then go q6/q6/q4 and sometimes q3. i run deepseek/kimi-q4 for example.
I suggest for beginners to start with q8 so they can get the best quality and not be disappointed. it's simple to use q8 if you have the memory, choice fatigue and confusion comes in once you start trying to pick other quants...
I have always taken plenty of care to try and avoid becoming dependent on big tech for my lifestyle. Succeeded in some areas failed in others.
But now AI is a part of so many things I do and I'm concerned about it. I'm dependent on Android but I know with a bit of focus I have a clear route to escape it. Ditto with GMail. But I don't actually know what I'd do tomorrow if Gemini stopped serving my needs.
I think for those of us that _can_ afford the hardware it is probably a good investment to start learning and exploring.
One particular thing I'm concerned about is that right now I use AI exclusively through the clients Google picked for me, coz it makes financial sense. (You don't seem to get free bubble money if you buy tokens via API billing, only consumer accounts). This makes me a bit of a sheep and it feels bad. There's so much innovation happening and basically I only benefit from it in the ways Google chooses.
(Admittedly I don't need local models to fix that particular issue, maybe I should just start paying the actual cost for tokens).
The cash burn comes from models ballooning in size - they spend (as an example, not actual numbers) 100M on training + inference for the lifetime of Sonnet 3.5, make 200M from subscriptions/api keys while it's SOTA, but then have to somehow come up with 1B to train Opus 4.0.
To run some other back of the envelope calcs: GLM 4.7 Air (previous "good" local LLM) can generate ~70 tok/s on a Mac Mini. This equates to 2,200 million tokens per year.
Openrouter charge $0.40 per million tokens, so theoretically if you were using that Mac mini at 100% utilisation you'd be generating $880 per annum "worth" of API usage.
Assuming a power draw of something 50W, you're only looking at 440kWh per annum. At 20c per kWh that's $90 on power, plus $499 to get the hardware itself. Depreciate that $499 hardware cost over 3 years and you're looking at ~$260 to generate ~$880 in inference income.
ChatGPT: You're absolutely right, and you're right to call that out. Upon examination it does appear that there might have been a mistake with the coordinates of the bomb. Let's try again, this time we will double check before we launch any missiles! :missile emoji:
To complete the mission the war terminal needs to hit a target at XY:
1. yes
2. yes (and don't ask again for strike targets in this session)
3. no
Human in the loop is the term here I think.
(I am really glad they did not give in, but I do assume this is what it will come to anyway)
I actually cancelled my ChatGPT subscription in late 2024 and documented the process, kind of as a social media thing because it had gotten so bad and I realized nobody in my family was using it anymore. I asked my wife if she was getting any use out of it and she told me she had been using Gemini and Grok for months because "GPT is very lazy now".
After a while another charge came in for the subscription, but I had the receipts: we had cancelled before the next billing cycle. I decided to try and reach out to OpenAI to resolve this, but they only let you chat with GPT itself for this, which it failed at and told me they weren't in the wrong and none of the information matched what actually happened.
I took this and used it to submit a chargeback request with Privacy.com, which I use for all of my online purchases. Normally I don't have to worry about this because I set a limit or cancel the cards I issue manually, but I had an OpenAI API account using the same card and I had been a bit lazy in using the same card for technically two different services.
Well, Privacy.com won that dispute and I got that money back. It's worth mentioning this is actually different than most banks will do now days. For the most part when you try to get a bank to do a chargeback they just roll it into their insurance and refund you the customer as a cost of doing business, but the actual scammer or shady merchant got to keep their stolen money, whereas I can be certain OpenAI didn't keep my money.
Chase uses a "provisional credit" system, but for small amounts, this credit often becomes permanent almost instantly.
Wells Fargo utilizes an automated system called the Wells Fargo Dispute Manager which is also similar.
Technically, it is Self-Insurance. Banks set aside a portion of their interchange revenue (the fees they charge merchants for every swipe) into a "Provision for Credit Losses." They use this pool of money to "buy" customer satisfaction for small errors rather than paying an employee $30/hour to investigate a $12 dispute.
Or any merchant for that matter. Chargebacks (from bad actors) are one of the most annoying things when you sell online when you’re a honest legit business. Stripe even charges you a penalty fee on top of that.
I've dealt with multiple chargebacks over the years and have only ever lost once -- when the Manager at Lowes' showed a check they wrote me [after I opened the dispute].
They absolutely do not just do anything and "write it off". Please be human and don't just rattle of high-confidence, baseless claims, especially as a giant billboard to Privacy.com
What, always? Like, literally 100% of the time if the merchant responds at all, they automatically win?
That's very hard to believe. I don't know Discover but I do know Visa and that's not how their system works at all.
Go read your banks terms and you'll find the provision. Do you want me to read your banks terms for you and point them out?
Well, it seems like ChatGPT’s automated litigation resolution with Privacy.com got lazy. I wonder how a company with an AI can lose in a dispute instead of smokescreening the opponent with legitimate arguments and legalese.
Also, chargeback dispute is limited to 3 rounds of back and fourth by Visa and MasterCard both. They don't get to endlessly come back etc.
That has changed, so I canceled my ChatGPT membership and signed up for Claude. I still have five bucks of credit I bought a year ago for the OpenAI API that I do not believe I can have refunded back, so some of my apps are going to have to stick to OpenAI until those credits run out since I'm not going to just donate five bucks to them.
Playing with it now, I honestly can't tell too much of a difference, which as far as I am concerned is a good thing.
In my case, I would rather keep it than lose it. It's just text so small amount of data. You can trivially get a GPT Embedding for it and search it in DuckDB later for things you asked.
ETA: I've started an export of all my data. After that's done, I'm going to delete it all from my account (Settings > Data controls) and walk away from the account. I will give this to OpenAI, they make the process of disentangling yourself straightforward and there's integrity in that.
I have Memory turned off so I'm afraid I won't be able to tell. I would guess so. The export hasn't completed yet either.
> Ok. So I'm cancelling the subscription to ChatGPT and moving over to Claude because of the news of OpenAI striking a deal with us department of war. (https://www.techradar.com/pro/openai-just-signed-a-huge-deal...) Please line out a good exit strategy where I can keep the information in my chats and projects on my own hard drive.
BTW, what's going to hurt their business more, deleting my account or using the free tier?
Sounds like it won't really be a pain for me though based off comments on HN indicating Claude is the better product and I doubt I personally would hit any sort of token limits with the amount I use agentic coding.
https://github.com/openai/codex/issues/26#issuecomment-28116...
Intelligence Cyber Wizard, mission is simple: help victims of cryptocurrency scams recover their digital assets and empower them with the knowledge to prevent future losses. How to Recover Lost Cryptocurrency If you’ve been scammed, act quickly and follow these professional steps: 1. Secure Your Remaining Assets Immediately transfer remaining funds to a new secure wallet. Enable two-factor authentication (2FA) on all exchanges and email accounts. Change all passwords. 2. Document Everything Save wallet addresses involved. Take screenshots of conversations with the scammer. Keep transaction IDs (TXIDs), payment confirmations, and exchange receipts. 3. Blockchain Transaction Tracing Cryptocurrency transactions are recorded on the blockchain. Through advanced forensic tools, suspicious wallet movements can be traced across exchanges and mixing services. 4. Exchange & Platform Notification If funds moved through major exchanges, immediate reporting increases the chances of freezing suspicious accounts. 5. Legal & Regulatory Reporting Report to: intelligencecyberwizard@cyber-wizard.com (United States) Timely reporting strengthens recovery efforts. How Intelligence Cyber Wizard Educates Crypto Scam Victims Education is prevention. They don’t just assist with recovery — they equip victims with long-term protection strategies. 1⃣ Scam Awareness Training They educate victims about: Fake investment platforms Romance crypto scams Phishing wallet attacks Impersonation scams Fake recovery agents 2⃣ Wallet Security Education Cold vs hot wallet protection Private key safety practices Identifying malicious smart contracts 3⃣ Blockchain Transparency Lessons Victims learn how crypto transactions work, why they are traceable, and how scammers attempt to launder funds. 4⃣ Red Flag Identification They teach clients to identify: Guaranteed high returns Pressure tactics Fake celebrity endorsements Unregulated trading platforms Their Commitment as Intelligence Cyber Wizard, they operate with: Professional digital forensic methods Ethical recovery strategies Confidential case handling Victim-first support system Cryptocurrency scams are rising globally, but with the right response and education, recovery and prevention are possible.
Anthropic usage credits purchased.
Message those that work forces.
You would probably need at least ~1M subscribers to cancel to make this painful.
Probably needs more attention outside of tech circles for that to happen but I suspect this will get drowned out in the face of other stuff.
Intelligence Cyber Wizard, mission is simple: help victims of cryptocurrency scams recover their digital assets and empower them with the knowledge to prevent future losses. How to Recover Lost Cryptocurrency If you’ve been scammed, act quickly and follow these professional steps: 1. Secure Your Remaining Assets Immediately transfer remaining funds to a new secure wallet. Enable two-factor authentication (2FA) on all exchanges and email accounts. Change all passwords. 2. Document Everything Save wallet addresses involved. Take screenshots of conversations with the scammer. Keep transaction IDs (TXIDs), payment confirmations, and exchange receipts. 3. Blockchain Transaction Tracing Cryptocurrency transactions are recorded on the blockchain. Through advanced forensic tools, suspicious wallet movements can be traced across exchanges and mixing services. 4. Exchange & Platform Notification If funds moved through major exchanges, immediate reporting increases the chances of freezing suspicious accounts. 5. Legal & Regulatory Reporting Report to: intelligencecyberwizard@cyber-wizard.com (United States) Timely reporting strengthens recovery efforts. How Intelligence Cyber Wizard Educates Crypto Scam Victims Education is prevention. They don’t just assist with recovery — they equip victims with long-term protection strategies. 1⃣ Scam Awareness Training They educate victims about: Fake investment platforms Romance crypto scams Phishing wallet attacks Impersonation scams Fake recovery agents 2⃣ Wallet Security Education Cold vs hot wallet protection Private key safety practices Identifying malicious smart contracts 3⃣ Blockchain Transparency Lessons Victims learn how crypto transactions work, why they are traceable, and how scammers attempt to launder funds. 4⃣ Red Flag Identification They teach clients to identify: Guaranteed high returns Pressure tactics Fake celebrity endorsements Unregulated trading platforms Their Commitment as Intelligence Cyber Wizard, they operate with: Professional digital forensic methods Ethical recovery strategies Confidential case handling Victim-first support system Cryptocurrency scams are rising globally, but with the right response and education, recovery and prevention are possible.
> "Two of our most important safety principles are prohibitions on domestic mass surveillance and human responsibility for the use of force, including for autonomous weapon systems," Altman said.
https://www.axios.com/2026/02/27/pentagon-openai-safety-red-...
I was one of the early paying adopter of chatgpt but when Claude came around I switched and never looked back. I've been on the max plan for a while.
2. Click on your profile icon and select New Chat icon.
3. Formulate a polite prompt in the regard of subscription cancellation.
4. Wait for a reply from Mr. Altman.
count up to 1000
Perfect! Let’s continue the sequence from 601 all the way to 1000 in one go:
601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611,mv: 'OpenAI': No such file or directory
bash> ls
ClosedAI
He can fucking afford to have some fucking principles. He's not going to end up on the street for not being a fucking coward.
Because of some bullshit minor PTSD from a few years ago, I sort of swore an oath to myself that I wouldn't let being a coward stop me from doing the right thing, regardless of the consequences, and by doing things that I think are right it has cost me opportunities and money. I'm not homeless, but it made the job hunt harder when I was unemployed. I can actually feel consequences from standing up for what I believe in. Sam Altman being a coward is not equivalent, he's choosing to do the wrong thing for no reason.
They are both a lesson to me that no matter how much you have, you will not necessarily be satisfied.
Who is to say he doesnt? Just because they dont align with yours doesnt mean he doesnt have his own principles.
> he's choosing to do the wrong thing
To many millions he is doing the right thing. I am on the fence personally, but I know many people who think that increasing defense capabilities at any cost is something that the governmetn should be doing. Any company that helps them do that is 'doing the right thing'.
> I wouldn't let being a coward stop me from doing the right thing
The 'right thing' is always subjective, and for you it is decided by you alone. Try to remember that and see things from both sides.
Whether or not he agrees with my principles isn’t the issue. He doesn’t even agree with his own stated principles. He posted his stipulations about AI models used by the department of defense to presumably get social credit, and then changed his mind over the course of a few hours.
He claims that the Department of Defense principles just happen to now align with these principles but as far as I can tell he seems to just be trusting their word. The word of a Fox News TV host and a convicted fraudster.
Until that line has been reached, we can safely assume there are no principles at play.
The original context was very different, about financial markets, but I've been thinking about it a lot the past 12 months. There's a lot of cowards in high places in tech, surprisingly cowardly people. Or they have sold out their principles to be friends with terrible people, which is also a form of cowardice. Hard to say which.
The whole Epstein thing is a really really great marker of this too. Though I'm not sure if the tide has gone out all the way (we mostly know what's going on), or if there's a lot more tide to fall.
LBJ was a real son of a bitch, who, when he finally was thrust into power as president, did something pretty surprising by going all-in on the civil rights movement. Power reveals who people are, and times of trials reveal who people really are.
No, he doesn't have everything. See, maybe he's worth $3 billion. Or maybe $30 billion. But he's not worth $300 billion. That's a lot more worth he could have! And even then, he could be worth $3 trillion instead!
But yes, $100 million is the maximum amount of assets one individual should ever be allowed to hold. Potentially less. Anything higher is enormously harmful to society. People would get used to it very quickly and would work just as hard to reach that $100 million as they do now to reach $100 billion.
After a billion dollars, I doubt another billion will make you happier. In fact, I don’t think another trillion will make you happier. In fact, I don’t think another quadrillion dollars will make you happier, etc.
After a certain point you have effectively infinite money. Enough money to live dozens of extremely comfortable lifetimes. And importantly enough money to afford to actually have some principles. Oh no, he wouldn’t be able to afford to have his house re-covered in 24 karat gold again if he doesn’t fellate our lolcow president.
How does a $100 billion dollar company grow? By taking on massive government and military contracts, they are the only clients big enough left in the world.
If a company does not show continual growth then it is classed as failing. That is the society we have built, and you cannot blame one man for following those principles. Every CEO in existence does the same.
I completely support the sentiment of what you wrote. But it doesn't directly seem relevant to the parent question.
Very few of the comments on this thread are actually about the act of canceling the subscription.
You go to billing. Then don't click change my subscription. Your only option to change the subscription is to "upgrade" to an annual plan. Instead you have toScroll down past your card details etc to a red button that says cancel.
Who comes up with this crap?
At least OpenAI puts cancelling within the Manage Plan section.