Apple Foundation Models (opens in new tab)

(platform.claude.com)

485 pointsMehrdadKhnzd12d ago225 comments

225 comments

118 comments · 35 top-level

harrouet12d ago· 21 in thread

This is Apple commoditizing LLMs while keeping control of the UX.

They are a hardware company and will keep selling the best machine for AI use. Well done.

Benedict Evans may be right after all; frontier models look more and more like telecom companies in the 90s. Billions and billions of investment in infrastructure while others further up the stack captured all the value.

CuriouslyC12d ago

There will be frontier models that are non-commoditized, but they'll be kept guarded and hidden away, and you'll only get the final result, so that they can't be distilled and their harness can't be reverse engineered. They'll be billed like employees, rather than like a tool.

7 more replies

alecco12d ago

In spite of their deeper pockets, massive datacenters, colosal amounts of user data, and hundreds of thousands of top developers, even Amazon, Meta, Microsoft, and Google are well behind.

I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)

10 more replies

zitterbewegung12d ago

It is much better. Imagine if the whole Manhattan project could have been outsourced and costs you nothing. I expect in a short time that open source models will be almost or almost parity by 2030 and running on consumer devices.

1 more reply

axus12d ago

Last I checked the telcos made plenty of money in the 90s. Should Verizon be getting a cut of my Claude Pro subscription, since I use FIOS to access it?

2 more replies

enos_feedler12d ago

He denies comparing them to telecom companies and even says at various points in his writing. Instead he compares their usage to the usage of mobile data.

paulsutter12d ago

Try Mythos

post-it12d ago

> while keeping control of the UX.

Extremely tangential, but this is my favourite upshot of AI. For decades, companies have been walling off their services and forcing us into their fuckass UIs. Now over the course of the last twelve months, suddenly everything has an MCP and I can use it through my command line chat interface.

Any company that doesn't adapt gets so hammered by people's AI-DIY web scrapers that they have no choice but to cave.

swingboy12d ago

Does “the best machine for AI use” apply here considering these models are still server-side?

embedding-shape12d ago

The play here seems pretty evidence, if I may assume. Apple creates an interface that is generalized enough so you can easily swap models, and while Claude is preferred by Apple today, it may be any provider or even local models in the future, and the APIs the developers use remain the same, so "migration" becomes easier.

ABS12d ago

for the on-device model, yes it runs on the Neural Engine (at the moment) so a newer chip means faster, cheaper local inference. For the server side path this Claude package is about your machine is irrelevant since it's a network call. The same API covers both, so "best machine for AI" only bites when the session is actually local.

But we can imagine that the balance of what's on-device vs what's remote will move continuously towards the former as time, improved HW and improved local models keep progressing

brookst12d ago

I would think so, as “use” doesn’t specify implementation. If you use a word processor it may be running locally or remotely.

From a user’s perspective, it doesn’t matter.

1 more reply

WorldMaker12d ago

Apple's been trying to make the marketing appeal that "Private Compute Cloud" is also a hardware project. Given it seems to rely on low level details of device Hardware Security Modules, it's maybe even at least a little bit more than just "marketing spin".

1 more reply

halJordan12d ago

It's been clear for years now that eventually ai will be embedded at the os level. Apple even recognized it way back when they first introduced Apple Intelligence. Yes they're commoditizing llms or whatever. But this has been a user facing feature they've been iterating on for years now

dlev_pika12d ago

Apple’s play was a masterclass - unsure how deliberate it was, or how much of a choice thy actually had, but it’s turning out pretty well IMO.

Now if they can further reinforce their angle on Privacy, they might continue to be what they are (or more)

amelius12d ago

Now we only need to commoditize the hardware.

hedora12d ago

Check out AMD’s offerings.

They’re typically a bit better on high TDP stuff, and a bit worse on low TDP. They mostly match in the middle. I have a $500 AMD NUC and a slightly older $2000 MBP. Inference throughput is within 2x.

The comparison is a little messy: AMD currently maxes out at 128GB of RAM vs Apple’s discontinued 512. Apple has nothing to rival the Steam Deck.

jimbokun12d ago

This is what originally made Microsoft the most lucrative tech company of its day.

Android succeeded at this to an extent with phones, but Apple has been able to keep its products differentiated enough in the minds of consumers to maintain their premium pricing. So far.

1 more reply

klausa12d ago

How is this Apple keeping control of the UX?

matwood12d ago

The betas of the next OS's include a Siri AI chatbot, and the AI features are built into various parts of the OS. A user has no idea what model is powering any of it - Apple controls the UX.

2 more replies

wuliwong12d ago

I think there is an opportunity for a new hardware company to enter the market. I know this is just hypothetical but I believe that AI is revolutionary enough where a new approach to hardware and UI/UX will enable far more value to be derived from AI. I think the incumbents like Apple will stick to their familiar platforms and could get beaten out by a new competitor that is AI native to the core. Maybe? ¯\_(ツ)_/¯

rock_artist12d ago· 13 in thread

While I'm happy with Apple introducing this abstraction. my main concern was with local models.

I'd love using Gemma4 as an example. but thinking of a user. if 10 Apps each uses same model and downloads it, the phone will be bloated.

I still didn't understand if Apple provided a way for multiple apps uses same on-device model (without tricky namespaces and permissions).

I didn't see anything suggesting that's the case.

scosman12d ago

I think that's what they are trying to avoid. If you need on-device intelligence, their pitch was "The model the device already has is best", and if you need something more specific an adapter (aka, a fine-tune/lora) is best.

They were wrong when their on-device model was way behind. They still might be right in the long term.

While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode.

Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps.

- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.

- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.

- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.

- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)

You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app.

scotty7912d ago

But models aren't universally best, especially small ones. For text Gemma is great. For vision qwen3.6 is amazing.

1 more reply

jtfrench12d ago

That's a great opportunity for Apple to provide a universal unique model ID protocol and some shared storage space to allow devs to register models.

alwillis12d ago

Check out “Bring an LLM provider to the Foundation Models framework” - https://developer.apple.com/videos/play/wwdc2026/339

rock_artist12d ago

I see an id based ability suggesting `modelId`. but in current docs I cannot find any context to it. The other limit is that it suggests Swift Packages. but I'm not seeing any model management hints similar to Docker/Ollama/etc where:

- Application can ask for specific model, if available use it. if not, ask to download it (or try some fallback / alternative)

- User can manage models. So as a user I can clean unused models (and for non-techie have something similar to offloading apps when unused for some period of time).

klausa12d ago

The apps can use the system provided on-device model using the same framework and APIs; but there's no affordances to deduplicate custom models between apps.

satvikpendem12d ago

That is exactly what foundation models are, yes. Same in Android with AICore which uses Gemma underneath, apps can query the LLM and receive responses back rather than bundling in their own model.

trvz12d ago

Do you guys not have phones (with at least 1TB of storage)?

rock_artist12d ago

Who’s “you guys” a developer from Bay Area? A student with a MacBook Neo? Or John Appleseed who bought basic iPhone 17e?

2 more replies

mft_12d ago

I have a Mac with 4TB of storage but it’s still annoying when every new AI app I try installs its own virtual environment with a fresh copy of Python, PyTorch, other duplicate libraries, and then models on top of that.

4 more replies

fragmede12d ago

No? iPhones don't come standard with that much storage.

1 more reply

taneq12d ago

Sounds ripe for block-level deduplication. :D Or an API that lets you request a model and handles caching.

mohamedkoubaa12d ago

Ok but don't expect Anthropic to help with local models, that'll be something apple rolls out themselves if at all

daniel_iversen12d ago· 13 in thread

Is this Apple encouraging developers to go through their api abstraction layer to use LLMs so that when they launch their own (which I think we’ve heard they’ve been spending lots of money on training and might be somehow involved with Siri or current Apple AI?) that they can easily help devs make a seamless transition? Or is it just a developer nicety or something else?

tarcon12d ago

Apple has some clever mechanics to protect user data. I had to work with App tracking stuff lately and their approach to keeping user details private with anonymized cohorts (SKAN, Differential Privacy) before reporting tracking events to third party platforms was surprisingly well thought out. There is value in having them in your loop if you care about privacy.

HDThoreaun12d ago

My read of the ATT stuff is basically that it forced all the apps to use meta ad tracking because they’re the only ones who figured out how to serve relevant ads despite it.

1 more reply

willis93612d ago

It would be cool if they offered some kind of prompt sanitation option.

klausa12d ago

This is support for a new framework that ships with reality/mac/iPad/watch/tv/iOS 27 (and that they've promised to open-source later in the year, so presumably you'll also be able to lean on this if you ship Swift on your backend).

The framework's whole deal is that it lets you use the same API to target either the device built-in models, the Apple-hosted online models (Private Cloud Computer), or write your own shims to call out to arbitrarily hosted online models.

You can then dynamically route your calls to a different kind of model/provider, using system APIs, without having to write your own abstraction layer over "I want to use local model for this, but I want to use Claude for that", or having to integrate your own API integration with Anthropic/OpenAI APIs.

It abstracts things like tool calling in one place; and has a bunch of other niceties/oddities (it keeps the same "transcript" going, even if you dynamically switch providers/models during a session) and some other things.

1 more reply

pprotas12d ago

The cynic (or realist?) in my thinks this abstraction layer is Apple's way of making sure that users give their own Apple Intelligence credit for the underlying LLM functionality, even if another company is actually providing the LLM.

_the_inflator12d ago

Assembled in Cupertino once more. ;)

1 more reply

Gareth32112d ago

This is clearly because they plan to monetise AI in the future, and they don't want competition.

1 more reply

NorwegianDude12d ago

A dark, but not totally unfair take: It makes it easier for Apple to take payment for the models others provide, and even allows Apple, if they want to, to use the data to build a dataset for training their own models based on how users use third party models. It's only on Apple devices this API is used, so they split up the market by not letting developers use the same system if they want things to work on iOS, locking users even more in.

aesthesia12d ago

From the linked docs page:

> Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses. Usage is billed to your Anthropic account at standard API pricing. Your app decides when to use Claude and when to use Apple's on-device model: pass whichever model you want to each session.

oefrha12d ago

Call it Intelligence Store and charge… wait for it… 30%.

1 more reply

thombles12d ago

There are already on-device models that you can use through this framework as a developer. Claude would just be an additional one.

FinnKuhn12d ago

Maybe they plan to have the providers pay for being the default model? So basically, what Google is doing right now for search engines. The difference however is that Google is making money with additional search requests while AIs are (as of now) losing money with additional requests. I don't see the business case for them yet though.

mathisfun12312d ago

> which I think we’ve heard they’ve been spending lots of money on training and might be somehow involved with Siri or current Apple AI

Lol bro this is literally it this is the model they've been training (was Apple Foundation model not a big enough hint?)

post-it12d ago· 8 in thread

> a Swift package that makes Claude available as a server-side language model in Apple's Foundation Models framework

Ahh I was hoping for the opposite: all of the existing features of Claude Code but somehow running locally on my laptop's neural engine. A pipe dream on an M2 with 8 GB of RAM, but I had a flicker of hope there.

inickt12d ago

Check out this WWDC session. Obviously not going to compete with the frontier models (and I think 8GB is too small anyways), but Apple did demo MLX + OpenCode.

https://developer.apple.com/videos/play/wwdc2026/232/ https://www.youtube.com/watch?v=wykPErJ8M-8

satvikpendem12d ago

You can use OpenCode or Pi with SSD streaming so it technically will have all the features, just unbearably slow.

FuriouslyAdrift12d ago

I've found most of the frontier coding models require somewhere between 300GB to 1TB to run with full capabilities.

godzillabrennus12d ago

If only we could buy 1TB of unified memory in a Mac for $1k-$2k in total hardware costs. Apple would basically be able to extinguish the entirety of the market cap for Nvidia, OpenAI, Anthropic, and others all at once.

In 10 years, I hope my MacBook Pro can run today's frontier models and has 1TB of unified Memory.

6 more replies

pstuart12d ago

The work on LLM in a Flash will probably help, and Apple's NVMe architecture is well suited to maximize throughput could allow their devices to work better on larger models than other vendors.

1 more reply

jubilanti12d ago

> all of the existing features of Claude Code but somehow running locally on my laptop's neural engine

You can use environment variables to have claude code query literally any endpoint you choose as long as it has a compatible API.

570165240012d ago

I would not mind if cloud was actually private users iCloud. users pay for it, and it runs in Apple servers next to where users store their iPhotos already. that would be really elegant solution.

..but instead we get Claude, hosted who-knows-where. maybe in X-AI datacenters? maybe in Amazon somewhere? who knows..

willy_k12d ago

https://security.apple.com/blog/private-cloud-compute/

VadimPR12d ago· 6 in thread

How can you practically use this in software if you're to deploy this to users? Asking a user to create and enter their own API key is a bar too high for good UX.

hajile12d ago

The even bigger hurdle is selling token based pricing to normal (non-dev) users.

"You pay an indeterminant amount of money to ask a question and you might not even get the response you want without spending even more money" doesn't appeal to most people who aren't gamblers and explaining how "thank you" at the end of a long exchange can be expensive due to context is an even harder thing for an average person to swallow.

Token cost going up/down like a yo-yo also doesn't help. Normal users NEED fixed costs and don't want to expend energy constantly keeping up with the AI meta. "My subscription lasted much longer last month" isn't a winning problem either.

I think Apple is correct that Local LLM for most things is the future.

nate12d ago

Ugh. It really is. I have allihat.com which is the only safari extension (i think still) that talks to claude. And it's well sought for. But you as a user have to enter a friggin claude api key. :( And I still don't grok their TOS around this. Like you can still type: ```setup-token Set up a long-lived authentication token (requires Claude subscription)``` but this seems like a trap? :) Whose using this? Doesn't this like insta break their TOS if you use that anywhere?

Right now for allihat.com I just let people use the Apple model locally if you don't feel like using the claude key. And my conversions to paying user shot up like 3x! But it really isn't a replacement obviously to claude. I was hoping Apple would make proxying to Claude some kind of thing they do for me so I also don't have to proxy to my own server just to try and manage API to Claude usage.

daralthus12d ago

ppl pay for this?

Maxious12d ago

> For production, route requests through your own back end with .proxied

Apple is offering developers with less than 2 million downloads free AI models via their servers https://techcrunch.com/2026/06/08/apple-bets-cheaper-ai-will...

klausa12d ago

The same way you did it before — by proxying the requests to your backend.

cush12d ago

Users don’t give a API key. The docs show how to set up your backend proxy.

mcintyre199412d ago· 5 in thread

I think this is just Apple planning for their on-device models getting better, which makes sense given they have access to Gemini now. If developers use this for all their code calling an external LLM, then as Apple's model becomes more capable and covers more use cases it'll be easy to switch to it at individual call sites. That'll give apps better UX and save developers money on a bill that Apple doesn't get a cut of.

embedding-shape12d ago

> That'll give apps better UX and save developers money on a bill that Apple doesn't get a cut of.

With other words, it's unlikely to happen as there is no money in it. Better for Apple to create some new subscription "AI" and "AI-lite" plans people can subscribe to, and since Apple is a company and we all know what those care about, it's unlikely to become a utopia of local models running on your phone.

criddell12d ago

How does using Gemini lead to better on-device models?

halJordan12d ago

Apple is distilling models from gemini

Danox12d ago

Gemini is just a stopgap like using Intel processors or Qualcomm modems.

Danox12d ago

UX is just another word for ecosystem building, which is what Apple does best in comparison to their competition and also doesn’t hurt to do hardware to go along with it. Microsoft and Nvidia aren’t teaming up for nothing.

me551ah12d ago· 3 in thread

So where does the api key reside? You can’t ship it on the iOS client since anyone can read and abuse it

laxmansharma12d ago

https://platform.claude.com/docs/en/cli-sdks-libraries/libra...

yilugurlu12d ago

it says put into your API layer and proxy it.

hedora12d ago

Isn’t that a privacy and compliance nightmare?

GeekyBear12d ago· 2 in thread

This isn't Claude specific. Developers can also write apps that call Google's server based Gemini models.

> At WWDC, Apple announced that it's opening its Foundation Models framework to third-party cloud model providers. Starting with iOS 27, macOS 27, iPadOS 27, visionOS 27 and watchOS 27, model providers can implement the new public LanguageModel protocol to provide a common interface for model inference. We've made Gemini models available to the Foundation Models framework through the Firebase Apple SDK.

This provides a fully native development experience — cloud-hosted Gemini models can plug directly into the Foundation Models framework using the same API. That means the on-device Apple model and cloud-hosted Gemini models sit behind a shared API surface, so you can easily swap between local and cloud inference to fit your use case.

https://blog.google/innovation-and-ai/technology/developers-...

jdgoesmarching12d ago

The important part is Apple rebranding “OpenAI-compatible API” to “language model protocol” and I think we should all rally around this immediately before we’re cursed with that awful tongue twister.

klausa11d ago

That's not what that means.

Protocol in this context means a Swift language feature, like interface in some other languages: https://docs.swift.org/swift-book/documentation/the-swift-pr...

zkmon12d ago· 2 in thread

Coding agent itself an imposed layer. Now they are adding one more layer? Many times I think of coding agent as the vendor supervisor from the body shops of the 90's who promise the customer everything under the sky and thrash the poor contractor to deliver. Coding agents consume 10x more tokens just like how body shops charged their customers vs how they paid the contractors. For a simple test, the same task that makes the model to go out of context length when used via a coding agent, runs fine when prompted directly.

Layers are luxury and remove control and transparency.

klausa12d ago

You wouldn't use this when building a coding agent.

hedora12d ago

How else will I run my coding agent on your Mac without having you download a second LLM and double your memory usage?

mlpicker12d ago· 2 in thread

What I'm curious about is whether this is actually on-device. Apple's framework caps local models around 3B params last I looked, and Claude is way bigger than that. So either there's some hybrid setup I haven't seen documented, or this is mostly a Claude SDK in FM clothing. Anyone tried it on a plane?

brookst12d ago

Read the linked article? It is absolutely a cloud service. Neither Apple nor Anthropic is suggesting otherwise

ABS12d ago

it's cloud, the doc is explicit that requests go straight to api.anthropic.com with Apple not in the way.

so Claude via FM dies offline while Apple's on-device SystemLanguageModel (the ~3B one) keeps working. It isn't a hybrid really: the framework just has both implement the same LanguageModelSession protocol so "local 3B" and "remote frontier model" become a one-argument swap.

IMHO what's worth internalising is that the two share an API but nothing else: the on-device path runs on Apple's Neural Engine and costs battery (you can watch ANE power ramp while it works) while the cloud path costs API credits/tokens and does zero local compute. Same code, opposite cost model.

adithyassekhar12d ago· 1 in thread

> Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses.

I know this is from a developer perspective. But as a consumer this is just funny.

saretup12d ago

Why?

_pdp_12d ago· 1 in thread

From app developer standpoint why would anyone ship claude keys like that ... or am I missing something? From consumer standpoint - I guess they can use their own keys but it is not something that is very user friendly as you can imagine.

nl12d ago

it says:

Proxy (production)

For production, route requests through your own back end with .proxied. The relay at baseURL adds the Claude API credential server-side, so the app ships no key. The headers you provide are sent on every request so your proxy can authorize the caller.

https://platform.claude.com/docs/en/cli-sdks-libraries/libra...

pgt12d ago· 1 in thread

I’m surprised to see the model names hardcoded as an enum (e.g. `.sonnet4_6`), instead of a string with model discovery so that the user can select their preferred model without having to get a new app version through the App Store to support newer models.

klausa12d ago

>Model identifiers are values of ClaudeModel. Use a compiled-in constant, or construct one with explicit capabilities for an ID that isn't compiled in yet (see Capabilities):

Special emphasis on the "isn't compiled in yet" and "or construct one" bit.

21-DOT-DEV12d ago· 1 in thread

> Usage is billed to your Anthropic account at standard API pricing.

While expected, it’s still a bummer.

isoprophlex12d ago

The pricing squeezes will continue until token spend improves!

cush12d ago· 1 in thread

Since Claude is technically a subscription, Apple will slowly weasel their way into skimming 30% of the token spend

hmokiguess12d ago

How does it work now though? There is a Claude app on iOS

bentt12d ago· 1 in thread

I didn’t understand what they were doing with Apple Foundation Models until this. It made it sound like they were training their own. Good strat tho!

klausa12d ago

> It made it sound like they were training their own.

They are.

simianwords12d ago· 1 in thread

Serious question: this looks like a thin library on an API. Why is it a big deal?

hedora12d ago

Shared daemon (as others pointed out), and, later shared revenue, probably with Apple receiving payments to ship ad-laden, “editorialized” models. Hopefully, it’ll go the other way, and Apple will subsidize high quality model training.

stackedinserter12d ago· 1 in thread

I'm not sure if I want to touch anything Anthropic anymore.

hedora12d ago

OpenAI is worse from a public policy standpoint, and apparently Fable was yanked at Amazon’s request.

Enough is enough. I’m seriously evaluating open models this week.

otter012d ago

First Microsoft has broken keyfabe by putting "Copilot is for entertainment purposes only" in the Copilot terms of use and putting warnings in copilot for excel "avoid using COPILOT for ... any task requiring accuracy or reproducibility ... Tasks with legal, regulatory or compliance implications".

Then Apple quietly refuses to participate by not investing tens or hundreds of billions in creating a competing LLM. Sure, they resell Claude for the marks or utilize Gemini to placate the gullible fools but they know what's up.

https://www.microsoft.com/en-us/microsoft-copilot/for-indivi...

https://support.microsoft.com/en-US/Excel/copilot-function

_josh_meyer_12d ago

the github repo: https://github.com/anthropics/ClaudeForFoundationModels

Traster12d ago

This seems smart. Apple, despite not really leading in AI themselves, are right on the hot path of where developers are going to yolo slop into the ecosystem. Make a tonne of sense to define a nice clean API that places like Anthropic can build on top of and expose to developers.

It's also smart for them to make sure the billing is going direct from Anthropic to the developer. The initial thought is "That means Apple's not taking a cut", but from the other side of it, developers who use this API are going to have to expose that cost to customers somehow, and that translates to subscription/InAppPurchase etc. on top of which Apple will get it's 30%.

jedisct112d ago

Misleading title. This is about Claude for Apple Foundation Models, not about Apple Foundation Models

mark_l_watson12d ago

I think Apple has a fairly good plan for supplying a common API and default on device models.

What confuses me about this article is: The code examples Python, Ruby, etc.) look to me like the original Anthropic APIs, not Apple’s abstraction. Did I miss something?

gregman112d ago

So actually the most successful AI was OpenRouter Intelligence? Pronounced as OÏ.

theopsimist12d ago

Is this included in the free AI tier for small developers? Big news if so

ChrisArchitect12d ago

Associated blog post: https://claude.com/blog/claude-for-foundation-models

HelloUsername12d ago

Does "Apple Intelligence" need to be Turned On for this as well?

1 more reply

ryanshrott12d ago

Shared daemon is the only way this makes sense on-device. A 3B model at 4-bit is roughly 2GB - three apps loading their own copies would eat an 8GB phone.

r0fl11d ago

So many people have Apple a hard time for not focusing enough on ai.

Seems that the UX will be enough to win over users and investors

londons_explore12d ago

> A key bundled into an app is extractable from the shipping binary, and anyone who extracts it can make requests billed to your account. Use .apiKey for development only, and switch to a proxy before release.

I don't like this model. Then all the user data is visible to the proxy.

Far better would be some kind of micro payment architecture where a wallet is on the users device and coins are attached to each request.

We just need to live in the alternate universe where micro payments succeeded.

neuropacabra12d ago

Can someone explain me what it means in the context of Apple and ChatGPT/Claude/Mistral...?

570165240012d ago

so it is not "Private Cloud Compute"?

tonyoconnell12d ago

What it is

Apple's Foundation Models framework (shipping in iOS 27 / macOS 27 this fall) is the standard Swift API for on-device AI — the same API Apple uses for their own small model. This package makes Claude plug into that same API as a drop-in swap.

  // Apple's on-device model
  let session = LanguageModelSession(model: SystemLanguageModel.default)

  // Claude — same API, just different model constructor
  let session = LanguageModelSession(model: ClaudeLanguageModel(name: .sonnet4_6, auth: auth))

One API, two tiers. You write your app once against the Foundation Models protocol. On-device model handles fast/free/private tasks; Claude handles heavy reasoning, long context, or capability gaps — you swap the model, not your code.

You don't call the Anthropic API directly. Apple's framework handles streaming, tool calling, and structured output (@Generable) — you just get Claude's capability through it.

hit8run12d ago

Why would I want a nerfed model?

insumanth12d ago

This was expected. Apple will carefully choose what & how people can use AI in their ecosystem and will make sure of it. I hope "Apple Foundation Models" Eco-system grows with support from major model providers.

j / k navigate · click thread line to collapse

225 comments

118 comments · 35 top-level

harrouet12d ago· 21 in thread

This is Apple commoditizing LLMs while keeping control of the UX.

They are a hardware company and will keep selling the best machine for AI use. Well done.

tedggh12d ago

CuriouslyC12d ago

7 more replies

alecco12d ago

In spite of their deeper pockets, massive datacenters, colosal amounts of user data, and hundreds of thousands of top developers, even Amazon, Meta, Microsoft, and Google are well behind.

10 more replies

zitterbewegung12d ago

1 more reply

axus12d ago

Last I checked the telcos made plenty of money in the 90s. Should Verizon be getting a cut of my Claude Pro subscription, since I use FIOS to access it?

2 more replies

enos_feedler12d ago

He denies comparing them to telecom companies and even says at various points in his writing. Instead he compares their usage to the usage of mobile data.

paulsutter12d ago

Try Mythos

post-it12d ago

> while keeping control of the UX.

Any company that doesn't adapt gets so hammered by people's AI-DIY web scrapers that they have no choice but to cave.

swingboy12d ago

Does “the best machine for AI use” apply here considering these models are still server-side?

embedding-shape12d ago

ABS12d ago

But we can imagine that the balance of what's on-device vs what's remote will move continuously towards the former as time, improved HW and improved local models keep progressing

brookst12d ago

I would think so, as “use” doesn’t specify implementation. If you use a word processor it may be running locally or remotely.

From a user’s perspective, it doesn’t matter.

1 more reply

WorldMaker12d ago

1 more reply

halJordan12d ago

dlev_pika12d ago

Apple’s play was a masterclass - unsure how deliberate it was, or how much of a choice thy actually had, but it’s turning out pretty well IMO.

Now if they can further reinforce their angle on Privacy, they might continue to be what they are (or more)

amelius12d ago

Now we only need to commoditize the hardware.

hedora12d ago

Check out AMD’s offerings.

The comparison is a little messy: AMD currently maxes out at 128GB of RAM vs Apple’s discontinued 512. Apple has nothing to rival the Steam Deck.

jimbokun12d ago

This is what originally made Microsoft the most lucrative tech company of its day.

Android succeeded at this to an extent with phones, but Apple has been able to keep its products differentiated enough in the minds of consumers to maintain their premium pricing. So far.

1 more reply

klausa12d ago

How is this Apple keeping control of the UX?

matwood12d ago

The betas of the next OS's include a Siri AI chatbot, and the AI features are built into various parts of the OS. A user has no idea what model is powering any of it - Apple controls the UX.

2 more replies

wuliwong12d ago

rock_artist12d ago· 13 in thread

While I'm happy with Apple introducing this abstraction. my main concern was with local models.

I'd love using Gemma4 as an example. but thinking of a user. if 10 Apps each uses same model and downloads it, the phone will be bloated.

I still didn't understand if Apple provided a way for multiple apps uses same on-device model (without tricky namespaces and permissions).

I didn't see anything suggesting that's the case.

scosman12d ago

They were wrong when their on-device model was way behind. They still might be right in the long term.

- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.

- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.

- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.

- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)

scotty7912d ago

But models aren't universally best, especially small ones. For text Gemma is great. For vision qwen3.6 is amazing.

1 more reply

jtfrench12d ago

That's a great opportunity for Apple to provide a universal unique model ID protocol and some shared storage space to allow devs to register models.

alwillis12d ago

Check out “Bring an LLM provider to the Foundation Models framework” - https://developer.apple.com/videos/play/wwdc2026/339

rock_artist12d ago

- Application can ask for specific model, if available use it. if not, ask to download it (or try some fallback / alternative)

- User can manage models. So as a user I can clean unused models (and for non-techie have something similar to offloading apps when unused for some period of time).

klausa12d ago

The apps can use the system provided on-device model using the same framework and APIs; but there's no affordances to deduplicate custom models between apps.

satvikpendem12d ago

That is exactly what foundation models are, yes. Same in Android with AICore which uses Gemma underneath, apps can query the LLM and receive responses back rather than bundling in their own model.

trvz12d ago

Do you guys not have phones (with at least 1TB of storage)?

rock_artist12d ago

Who’s “you guys” a developer from Bay Area? A student with a MacBook Neo? Or John Appleseed who bought basic iPhone 17e?

2 more replies

mft_12d ago

4 more replies

fragmede12d ago

No? iPhones don't come standard with that much storage.

1 more reply

taneq12d ago

Sounds ripe for block-level deduplication. :D Or an API that lets you request a model and handles caching.

mohamedkoubaa12d ago

Ok but don't expect Anthropic to help with local models, that'll be something apple rolls out themselves if at all

daniel_iversen12d ago· 13 in thread

tarcon12d ago

HDThoreaun12d ago

My read of the ATT stuff is basically that it forced all the apps to use meta ad tracking because they’re the only ones who figured out how to serve relevant ads despite it.

1 more reply

willis93612d ago

It would be cool if they offered some kind of prompt sanitation option.

klausa12d ago

1 more reply

pprotas12d ago

_the_inflator12d ago

Assembled in Cupertino once more. ;)

1 more reply

Gareth32112d ago

This is clearly because they plan to monetise AI in the future, and they don't want competition.

1 more reply

NorwegianDude12d ago

aesthesia12d ago

From the linked docs page:

oefrha12d ago

Call it Intelligence Store and charge… wait for it… 30%.

1 more reply

thombles12d ago

There are already on-device models that you can use through this framework as a developer. Claude would just be an additional one.

FinnKuhn12d ago

mathisfun12312d ago

> which I think we’ve heard they’ve been spending lots of money on training and might be somehow involved with Siri or current Apple AI

Lol bro this is literally it this is the model they've been training (was Apple Foundation model not a big enough hint?)

post-it12d ago· 8 in thread

> a Swift package that makes Claude available as a server-side language model in Apple's Foundation Models framework

inickt12d ago

Check out this WWDC session. Obviously not going to compete with the frontier models (and I think 8GB is too small anyways), but Apple did demo MLX + OpenCode.

https://developer.apple.com/videos/play/wwdc2026/232/ https://www.youtube.com/watch?v=wykPErJ8M-8

satvikpendem12d ago

You can use OpenCode or Pi with SSD streaming so it technically will have all the features, just unbearably slow.

FuriouslyAdrift12d ago

I've found most of the frontier coding models require somewhere between 300GB to 1TB to run with full capabilities.

godzillabrennus12d ago

In 10 years, I hope my MacBook Pro can run today's frontier models and has 1TB of unified Memory.

6 more replies

pstuart12d ago

The work on LLM in a Flash will probably help, and Apple's NVMe architecture is well suited to maximize throughput could allow their devices to work better on larger models than other vendors.

1 more reply

jubilanti12d ago

> all of the existing features of Claude Code but somehow running locally on my laptop's neural engine

You can use environment variables to have claude code query literally any endpoint you choose as long as it has a compatible API.

570165240012d ago

I would not mind if cloud was actually private users iCloud. users pay for it, and it runs in Apple servers next to where users store their iPhotos already. that would be really elegant solution.

..but instead we get Claude, hosted who-knows-where. maybe in X-AI datacenters? maybe in Amazon somewhere? who knows..

willy_k12d ago

https://security.apple.com/blog/private-cloud-compute/

VadimPR12d ago· 6 in thread

How can you practically use this in software if you're to deploy this to users? Asking a user to create and enter their own API key is a bar too high for good UX.

hajile12d ago

The even bigger hurdle is selling token based pricing to normal (non-dev) users.

I think Apple is correct that Local LLM for most things is the future.

nate12d ago

daralthus12d ago

ppl pay for this?

Maxious12d ago

> For production, route requests through your own back end with .proxied

Apple is offering developers with less than 2 million downloads free AI models via their servers https://techcrunch.com/2026/06/08/apple-bets-cheaper-ai-will...

klausa12d ago

The same way you did it before — by proxying the requests to your backend.

cush12d ago

Users don’t give a API key. The docs show how to set up your backend proxy.

mcintyre199412d ago· 5 in thread

embedding-shape12d ago

> That'll give apps better UX and save developers money on a bill that Apple doesn't get a cut of.

criddell12d ago

How does using Gemini lead to better on-device models?

halJordan12d ago

Apple is distilling models from gemini

Danox12d ago

Gemini is just a stopgap like using Intel processors or Qualcomm modems.

Danox12d ago

me551ah12d ago· 3 in thread

So where does the api key reside? You can’t ship it on the iOS client since anyone can read and abuse it

laxmansharma12d ago

https://platform.claude.com/docs/en/cli-sdks-libraries/libra...

yilugurlu12d ago

it says put into your API layer and proxy it.

hedora12d ago

Isn’t that a privacy and compliance nightmare?

GeekyBear12d ago· 2 in thread

This isn't Claude specific. Developers can also write apps that call Google's server based Gemini models.

https://blog.google/innovation-and-ai/technology/developers-...

jdgoesmarching12d ago

klausa11d ago

That's not what that means.

Protocol in this context means a Swift language feature, like interface in some other languages: https://docs.swift.org/swift-book/documentation/the-swift-pr...

zkmon12d ago· 2 in thread

Layers are luxury and remove control and transparency.

klausa12d ago

You wouldn't use this when building a coding agent.

hedora12d ago

How else will I run my coding agent on your Mac without having you download a second LLM and double your memory usage?

mlpicker12d ago· 2 in thread

brookst12d ago

Read the linked article? It is absolutely a cloud service. Neither Apple nor Anthropic is suggesting otherwise

ABS12d ago

it's cloud, the doc is explicit that requests go straight to api.anthropic.com with Apple not in the way.

adithyassekhar12d ago· 1 in thread

> Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses.

I know this is from a developer perspective. But as a consumer this is just funny.

saretup12d ago

Why?

_pdp_12d ago· 1 in thread

nl12d ago

it says:

Proxy (production)

https://platform.claude.com/docs/en/cli-sdks-libraries/libra...

pgt12d ago· 1 in thread

klausa12d ago

>Model identifiers are values of ClaudeModel. Use a compiled-in constant, or construct one with explicit capabilities for an ID that isn't compiled in yet (see Capabilities):

Special emphasis on the "isn't compiled in yet" and "or construct one" bit.

21-DOT-DEV12d ago· 1 in thread

> Usage is billed to your Anthropic account at standard API pricing.

While expected, it’s still a bummer.

isoprophlex12d ago

The pricing squeezes will continue until token spend improves!

cush12d ago· 1 in thread

Since Claude is technically a subscription, Apple will slowly weasel their way into skimming 30% of the token spend

hmokiguess12d ago

How does it work now though? There is a Claude app on iOS

bentt12d ago· 1 in thread

I didn’t understand what they were doing with Apple Foundation Models until this. It made it sound like they were training their own. Good strat tho!

klausa12d ago

> It made it sound like they were training their own.

They are.

simianwords12d ago· 1 in thread

Serious question: this looks like a thin library on an API. Why is it a big deal?

hedora12d ago

stackedinserter12d ago· 1 in thread

I'm not sure if I want to touch anything Anthropic anymore.

hedora12d ago

OpenAI is worse from a public policy standpoint, and apparently Fable was yanked at Amazon’s request.

Enough is enough. I’m seriously evaluating open models this week.

otter012d ago

https://www.microsoft.com/en-us/microsoft-copilot/for-indivi...

https://support.microsoft.com/en-US/Excel/copilot-function

_josh_meyer_12d ago

the github repo: https://github.com/anthropics/ClaudeForFoundationModels

Traster12d ago

jedisct112d ago

Misleading title. This is about Claude for Apple Foundation Models, not about Apple Foundation Models

mark_l_watson12d ago

I think Apple has a fairly good plan for supplying a common API and default on device models.

What confuses me about this article is: The code examples Python, Ruby, etc.) look to me like the original Anthropic APIs, not Apple’s abstraction. Did I miss something?

gregman112d ago

So actually the most successful AI was OpenRouter Intelligence? Pronounced as OÏ.

theopsimist12d ago

Is this included in the free AI tier for small developers? Big news if so

ChrisArchitect12d ago

Associated blog post: https://claude.com/blog/claude-for-foundation-models

HelloUsername12d ago

Does "Apple Intelligence" need to be Turned On for this as well?

1 more reply

ryanshrott12d ago

Shared daemon is the only way this makes sense on-device. A 3B model at 4-bit is roughly 2GB - three apps loading their own copies would eat an 8GB phone.

r0fl11d ago

So many people have Apple a hard time for not focusing enough on ai.

Seems that the UX will be enough to win over users and investors

londons_explore12d ago

I don't like this model. Then all the user data is visible to the proxy.

Far better would be some kind of micro payment architecture where a wallet is on the users device and coins are attached to each request.

We just need to live in the alternate universe where micro payments succeeded.

neuropacabra12d ago

Can someone explain me what it means in the context of Apple and ChatGPT/Claude/Mistral...?

570165240012d ago

so it is not "Private Cloud Compute"?

tonyoconnell12d ago

What it is

  // Apple's on-device model
  let session = LanguageModelSession(model: SystemLanguageModel.default)

  // Claude — same API, just different model constructor
  let session = LanguageModelSession(model: ClaudeLanguageModel(name: .sonnet4_6, auth: auth))

You don't call the Anthropic API directly. Apple's framework handles streaming, tool calling, and structured output (@Generable) — you just get Claude's capability through it.

hit8run12d ago

Why would I want a nerfed model?

insumanth12d ago

j / k navigate · click thread line to collapse