If the M5 generation gets this GPU upgrade, which I don't see why not, then the era of viable local LLM inferencing is upon us.
That's the most exciting thing from this Apple's event in my opinion.
PS. I also like the idea of the ultra thin iPhone Air, the 2x better noise cancellation and live translation of Airpods 3, high blood pressure detection of the new Watch, and the bold sexy orange color of the iPhone 17 Pro. Overall, this is as good as it gets for incremental updates in Apple's ecosystem in a while.
Luckily they added the blood pressure check for when you get too excited about the color orange.
edit: It was only the Pros and up which had titanium bodies. The 17s are all aluminum.
Which is a very powerful feature for anyone who likes security or finding bugs in their code. Or other people's code. Even if you didn't really want to find them.
https://www.apple.com/watch/compare/?modelList=watch-series-...
In the past few weeks the oxymeter feature was enabled by a firmware update on series 10. Measurements are done on the watch, results are only reported on a phone.
As of September 9, 2025, hypertension notifications are currently under FDA review and expected to be cleared this month, with availability on Apple Watch Series 9 and later and Apple Watch Ultra 2 and later. The feature is not intended for use by people under 22 years old, those who have been previously diagnosed with hypertension, or pregnant persons.
Sounds a bit ironic but I guess it's for legal reasons.
The color line up reminds me of the au MEDIA SKIN phones (Japanese carrier) circa 2007. Maybe it's because I had one back in the day, but I can't help but think they took some influence.
Wow, thanks for sharing the name, these are really good! I don't know why I was surprised to realize that great designers have made fantastic products even in the past...
Some sites with images, for anyone curious: 1. https://www.dezeen.com/2007/01/17/tokujin-yoshioka-launches-... 2. https://spoon-tamago.com/best-of-2007-part-iv/
With the addition of NPUs to the GPU, this story gets even more confusing...
But there isn’t a trivial way to specifically target the neural engine.
If you use Metal / GPU compute shaders it's going to run exclusively on GPU. Some inference libraries like TensorFlow/LiteRT with backend = .gpu use this.
This change is strictly adding matmul acceleration into each GPU core where it is being used for LLMs.
Intrigued to explore with a19/m5 and test energy efficiency.
So is the high blood pressure detection. It's not from the new watch, it works also in the series 10 and series 9 watches.
I don't think local LLMs will ever be a thing except for very specific use cases.
Servers will always have way more compute power than edge nodes. As server power increases, people will expect more and more of the LLMs and edge node compute will stay irrelevant since their relative power will stay the same.
Mobile applications are also relevant. An LLM in your car could be used for local intelligence. I'm pretty sure self driving cars use some about of local AI already (although obviously not LLM, and I don't really know how much of their processing is local vs done on a server somewhere).
If models stop advancing at a fast clip, hardware will eventually become fast and cheap enough that running models locally isn't something we think about as being a non-sensical luxury, in the same way that we don't think that rendering graphics locally is a luxury even though remote rendering is possible.
Even over LTE you're looking at under 120ms coast to coast.
This doesn't seem right to me.
You take all the memory and CPU cycles of all the clients connected to a typical online service, compared to the memory and CPU in the datacenter serving it? The vast majority of compute involved in delivering that experience is on the client. And there's probably vast amounts of untapped compute available on that client - most websites only peg the client CPU by accident because they triggered an infinite loop in an ad bidding war; imagine what they could do if they actually used that compute power on purpose.
But even doing fairly trivial stuff, a typical browser tab is using hundreds of megs of memory and an appreciable percentage of the CPU of the machine it's loaded on, for the duration of the time it's being interacted with. Meanwhile, serving that content out to the browser took milliseconds, and was done at the same time as the server was handling thousands of other requests.
Edge compute scales with the amount of users who are using your service: each of them brings along their own hardware. Server compute has to scale at your expense.
Now, LLMs bring their special needs - large models that need to be loaded into vast fast memory... there are reasons to bring the compute to the model. But it's definitely not trivially the case that there's more compute in servers than clients.
A single datacenter machine with state of the art GPUs serving LLM inference can be drawing in the tens of kilowatts, and you borrow a sizable portion for a moment when you run a prompt on the heavier models.
A phone that has to count individual watts, or a laptop that peaks on dual digit sustained draw, isn't remotely comparable, and the gap isn't one or two hardware features.
the problem is people expectation, they want the model to be smart
people aren't having problem for if its local or not, but they want the model to be useful
> Deepseek-r1 was loaded and ran locally on the Mac Studio
> M3 Ultra chip [...] 32-core CPU, an 80-core GPU, and the 32-core Neural Engine. [...] 512GB of unified memory, [...] memory bandwidth of 819GB/s.
> Deepseek-r1 was loaded [...] 671-billion-parameter model requiring [...] a bit less than 450 gigabytes of [unified] RAM to function.
> the Mac Studio was able to churn through queries at approximately 17 to 18 tokens per second
> it was observed as requiring 160 to 180 Watts during use
Considering getting this model. Looking into the future, a Mac Studio M5 Ultra should be something special.
[0] https://appleinsider.com/articles/25/03/18/heavily-upgraded-...
Apple's privacy stance is to do as much as possible on the user's device and as little as possible in cloud. They have iCloud for storage to make inter-device synch easy, but even that is painful for them. They hate cloud. This is the direction they've had for some years now. It always makes me smile that so many commentators just can't understand it and insist that they're "so far behind" on AI.
All the recent academic literature suggests that LLM capability is beginning to plateau, and we don't have ideas on what to do next (and no, we can't ask the LLMs).
As you get more capable SLMs or LLMs, and the hardware gets better and better (who _really_ wants to be long on nVIDIA or Intel right now? Hmm?), people are going to find that they're "good enough" for a range of tasks, and Apple's customer demographic are going to be happy that's all happening on the device in their hand and not on a server [waves hands] "somewhere", in the cloud.
Large issues: tokenizers exist, reasoning models are still next-token-prediction instead of having "internal thoughts", RL post-training destroys model calibration
Small issues: they're all trained to write Python instead of a good language, most of the benchmarks are bad, pretraining doesn't use document metadata (ie they have to learn from each document without being told the URL or that they're written by different people)
Android crowd has been able to run LLMs on-device since LlamaCPP first came out. But the magic is in the integration with OS. As usual there will be hype around Apple, idk, inventing the very concept of LLMs or something. But the truth is neither Apple nor Android did this; only the wee team that wrote the attention is all you need paper + the many open source/hobbyist contributors inventing creative solutions like LoRA and creating natural ecosystems for them.
That's why I find this memo so cool (and will once again repost the link): https://semianalysis.com/2023/05/04/google-we-have-no-moat-a...
Is it 'Local'?, 'Large?'...'Language?'
I disagree.
There's a lot of interest in local LLMs in the LLM community. My internet was down for a few days and did I wish I had a local LLM on my laptop!
There's a big push for privacy; people are using LLMs for personal medical issues for example and don't want that going into the cloud.
Is it necessary to talk to a server just to check out a letter I wrote?
Obviously with Apple's release of iOS 26 and macOS 26 and the rest of their operating systems, tens of millions of devices are getting a local LLM with 3rd party apps that can take advantage of them.
I'm running Qwen 30B code on my framework laptop to ask questions about ruby vs. python syntax because I can, and because the internet was flaky.
At some point, more doesn't mean I need it. LLMs will certainly get "good enough" and they'll be lower latency, no subscription, and no internet required.
Or do you have to copy paste into LM studio?
"iPhone4 vs HTC Evo"
But it's not general purpose. Broken by design.
I'll pass. Not going to support this. We need less of this crap not more.
The whole point of CoreML is that your solution uses whatever hardware is available to you, including enlisting a heterogeneous set of units to conquer a large problem. Software written years ago would use the GPU matmul if deployed to a capable machine.
Though I do wonder, given the logarithmic nature of sound perception, are these numbers deceptive in terms of what the user will perceive?
Based on that, it doesn't sound like it's that much worse. Of course, if you're trying to maximize battery longevity by not exceeding 80% charge, that might make it not very useful for many people.
- mobile mp3 player sales are low unless disk and battery life are greatly improved
- large display touch screen phone market is small unless someone solves the "app problem"
- smart watch market is tiny if exists at all unless someone makes one that is useful and has improved battery life
For those that are not chronically online, a mobile phone from a decade ago has everything they need. If you only have to phone the family, WhatsApp your neighbours, get the map out, use a search engine and do your online banking, then a flagship phone is a bit over the top. If anything, the old phone is preferable since its loss would not be the end of the world.
I have seen a few elderly neighbours rocking Samsung Galaxy S7s with no need to upgrade. Although the S7 isn't quite a decade old, the apps that are actually used (WhatsApp, online banking) will be working with the S7 for many years to come since there is this demographic of active users.
Now, what if we could get these people to upgrade every three years with a feature that the 'elderly neighbour' would want? Eyesight isn't what it used to be in old age, so how about a nice big screen?
You can't deliberately hobble the phone with poor battery life or programme it to go slow in an update because we know that isn't going to win the customer over, but a screen that gets tatty after three years? Sounds good to me.
Probably trying to find better screen materials, and addressing reliability issues.
I used Palm devices with resistive touch screens. It was good, but when you go glass, there's no turning back.
I would never buy a phone with folding screens protected by plastic. I want a dependable slab. Not a gimmicky gadget which can die any moment. I got my fix for dying flex cables with Casiopeia PDAs. Never again.
Apple cancelled their mini line which was 3% of sales.
It’s not a big enough slice for them to want to chase.
Typical strat for them is not to be first with an innovation, but to wait and work out the kinks enough that they can convince people that the tradeoffs are well worth making. Apple wouldn't be chasing that existing slice, they'd be trying to entice a larger share of their customers to upgrade faster.
There will never be a folding iPhone, simple as.
Also flip phones aren't dorky and have a 2000s vibe - but they don't fit Apple "you can have any color as long as it's black" approach to design.
In some ways I can't even fault them - fragmenting your device shapes/experiences to chase a niche look is not good business. But this is exactly what's pushing me out of Apple ecosystem - it's so locked down that if you don't want to fit into their narrow product lines you have no other options. There are no third party watch makers using apple watch hardware and software. No other phone makers with access to iPhone internals and iOS. Nobody can hack a PC OS onto an iPad or build a 2in1 MacOS device.
I feel like this is the last gen of Apple tech I'm in on - I just find there are so many devices that are compelling to me personally but don't fit into the walled garden. Plus Google seems light-year ahead on delivering a smart assistant.
Oh and I remember everyone mocking the airpods pro when they came out. Now everyone is wearing them.
For phones what really matters for most people is... the screen size. And a folding phone is basically the best thing you can get right now for that.
The only problem is pricing at the moment.
My use-case is for travel, where I want to read books, and the very occasional time when I want to do some design work outside the office -- draw a diagram that sort of thing. A third rare use case is where a web site is buggy or limited in functionality for mobile browsers. In all these cases the unfolded screen allows me to do the thing I need to do without carrying a second device (tablet, eReader). Another marginal use-case is to show another person a photograph. The fold out screen is much easier to see and I think has better color rendition too.
For these use-cases I find the folding phone very worthwhile.
But...the benefit that trumps all that is that the phone itself is smaller (narrower) than the typical flagship phones these days. It fits in my pocket and my hand reaches across it. I'd never go back to a non-folding phone for this reason alone, even if I never unfolded it. In fact I almost never do unfold it, except when traveling.
fwiw it wasn't until the Fold6 that the "cover screen" typing experience was ok. I understand that the Fold7 is a bit wider and so probably better, but I can't justify the expense to upgrade so will sit out until the Fold8.
I guess if you're the sort that is not clumsy and you're in a mild climate you might get your money's worth
for reference these were Samsung Z Flip devices
The one I have used felt like using a real phone through a layer of vinyl, definitely not a pleasant experience.
They're buying another year of very-high margin phones I guess...