Apple's On-Device and Server Foundation Models (opens in new tab)

(machinelearning.apple.com)

941 points2bit2y ago535 comments

535 comments

206 comments · 38 top-level

rishabhjain11982y ago· 45 in thread

For people interested in AI research, there's nothing new here.

IMO they should do a better job of referencing existing papers and techniques. The way they wrote about "adaptors" can make it seem like it's something novel, but it's actually just re-iterating vanilla LoRA. It was enough to convince one of the top-voted HackerNews comments that this was a "huge development".

Benchmarks are nice though.

lolinder2y ago

> For people interested in AI research, there's nothing new here.

Was anyone expecting anything new?

Apple has never been big on living at the cutting edge of technology exploring spaces that no one has explored before—from laptops to the iPhone to iPads to watches, every success they've had has come from taking tech that was already prototyped by many other companies and smoothing out the usability kinks to get it ready for the mainstream. Why would deep learning be different?

csvm2y ago

Prototyping tech is one thing; making it a widely adopted success is another. For instance, Apple was the first to bring WiFi to laptops in 1999. Everyone laughed at them at the time. Who needs a wireless network when you can have a physical LAN, ey?

3 more replies

IOT_Apprentice2y ago

Apple was first with 64 bit iPhone chips. Remember Qualcomm VP at the time claimed it was nothing. Apple Silicon for M1 was impressive for instant in low power high performance.

2 more replies

jeanlucas2y ago

> For people interested in AI research

I think he is pointing out for people interested in research.

OTOH, it is interesting to see how a company is applying AI to customers at the end. It will bring up new challenges that will be interesting from at least an engineering point of view.

rmbyrro2y ago

I think you misinterpreted OP's comment. Apple makes it sound like there's smth new, but there isn't. They don't have to innovate, but it's good practice to credit who've done what they're taking and using. Also to use the names everyone else is already using.

2 more replies

caseyy2y ago

> Apple has never been big on living at the cutting edge of technology

There was such a time. Same as with Google. Interestingly, around 2015-2016 both companies significantly shifted to iterative products from big innovations. It's more visible with Google than Apple, but here's both.

Apple:

- Final Cut Pro

- 1998: iMac

- 1999: iBook G3 (father of all MacBooks)

- 2000: Power Mac G4 Cube (the early grandparent of the Mac Mini form factor), Mac OS X

- 2001: iPod, iTunes

- 2002: Xserve (rackable servers)

- 2003: Iterative products only

- 2004: iWork Suite, Garage Band

- 2005: iPod Nano, Mac mini

- 2006: Intel Macs, Boot Camp

- 2007: iPhone and Apple TV

- 2008: MacBook Air, iPhone 3G

- 2009: iPhone 3Gs, all-in-one iMac

- 2010: iPad, iPhone 4

- 2011: Final Cut Pro X

- 2012: Retina displays, iBooks Author

- 2013: iWork for iCloud

- 2014: Swift

- 2015: Apple Watch, Apple Music

- 2016: Iterative products only

- 2017: Iterative products mainly, plus ARKit

- 2018: Iterative products only

- 2019: Apple TV +, Apple Arcade

- 2020: M1

- 2021: Iterative products only

- 2022: Iterative products only

- 2023: Apple Vision Pro

Google:

- 1998: Google Search

- 2000: AdWords (this is where it all started going wrong, lol)

- 2001: Google Images Search

- 2002: Google News

- 2003: Google AdSense

- 2004: Gmail, Google Books, Google Scholar

- 2005: Google Maps, Google Earth, Google Talk, Google Reader

- 2006: Google Calendar, Google Docs, Google Sheets, YouTube bought this year

- 2007: Street View, G Suite

- 2008: Google Chrome, Android 1.0

- 2009: Google Voice, Google Wave (early Docs if I recall correctly)

- 2010: Google Nexus One, Google TV

- 2012: Google Drive

- 2013: Chromecast

- 2014: Android Wear, Android Auto, Google Cardboard, Nexus 6, Google Fit

- 2015: Google Photos

- 2016: Google Assistant, Google Home

- 2017: Mainly iterative products only, Google Lens announced but it never rolled out really

- 2018: Iterative products only

- 2019: Iterative products only

- 2020: Iterative products only, and some rebrands (Talk->Chat, etc)

- 2021: Iterative products only, and Tensor Chip

- 2022: Iterative products only

- 2023: Iterative products only, and Bard (half-baked).

4 more replies

throwaway4good2y ago

I thought the news of them using Apple Silicon rather than NVIDIA in their data centers was significant.

Perhaps there is still hope of a relaunch of xserve; with the widespread use of Apple computers amongst developers Apple has a real chance of challenging NVIDIA's CUDA moat.

pjmlp2y ago

Not at Apple's price points.

2 more replies

grecy2y ago

I'm wondering how my electricity they will save just from moving from Intel to Apple Silicon in their data centers.

derefr2y ago

I think the thing they're saying that's novel, isn't what they have (LoRAs), but where and when and how they make them.

Rather than just pre-baking static LoRAs to ship with the base model (e.g. one global "rewrite this in a friendly style" LoRA, etc), Apple seem to have chosen a bounded set of behaviors they want to implement as LoRAs — one for each "mode" they want their base model to operate in — and then set up a pipeline where each LoRA gets fine-tuned per user, and re-fine-tuned any time the data dependencies that go into the training dataset for the given LoRA (e.g. mail, contacts, browsing history, photos, etc) would change.

In other words, Apple are using their LoRAs as the state-keepers for what will end up feeling to the user like semi-online Direct Preference Optimization. (Compare/contrast: what Character.AI does with their chatbot response ratings.)

---

I'm not as sure, from what they've said here, whether they're also implying that these models are being trained in the background on-device.

It could very well be possible: training something that's only LoRA-sized, on a vertically-integrated platform optimized for low-energy ML, that sits around awake but doing nothing for 8 hours a day, might be practical. (Normally it'd require a non-quantized copy of the model, though. Maybe they'll waste even more of your iPhone's disk space by having both quantized and non-quantized copies of the model, one for fast inference and the other for dog-slow training?)

But I'm guessing they've chosen not to do this — as, even if it were practical, it would mean that any cloud-offloaded queries wouldn't have access to these models.

Instead, I'm guessing the LoRA training is triggered by the iCloud servers noticing you've pushed new data to them, and throwing a lifecycle notification into a message queue of which the LoRA training system is a consumer. The training system reduces over changes to bake out a new version of any affected training datasets; bakes out new LoRAs; and then basically dumps the resulting tensor files out into your iCloud Drive, where they end up synced to all your devices.

Hugsun2y ago

There is no way they would secretly train loras in the background of their user's phones. The benefits are small compared to the many potential problems. They describe some LoRA training infrastructure which is likely using the same capacity as they used to train the base models.

> ...each LoRA gets fine-tuned per user...

Apple would not implement these sophisticated user specific LoRA training techniques without mentioning them anywhere. No big player has done anything like this and Apple would want the credit for this innovation.

wmf2y ago

I don't think the LoRAs are fine-tuned locally at all. It sounds like they use RAG to access data.

1 more reply

throwthrowuknow2y ago

I think you’re misunderstanding what they mean by adapting to use cases. See this passage:

> The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand

This along with other statements in the article about keeping the base model weights unchanged says to me that they are simply swapping out adapters on a per app or per task basis. I highly doubt they will fine tune adapters on user data since they have taken a position against this. I wonder how successful this approach will be vs merging the adapters with the base model. I can see the benefits but there are also downsides.

rvaish2y ago

Easel has been on iMessage for a bit now: https://apps.apple.com/us/app/easel-ai/id6448734086

avidphantasm2y ago

Very little of the “AI” boom has been novel, most has been iterative elaborations (though innovative nonetheless). Academics have been using neural network statistical models for decades. What’s new is the combination of compute capability and data volume available for training. It’s iterative all the way down though, that’s how all technologies are developed.

sigmoid102y ago

Most people don't realize this, but almost all research works that way. Only the media spins research as breakthrough-based, because that way it is easier to sell stories. But almost everything is incremental/iterative. Even the transformer architecture, which in some way can be seen as the most significant architectural advancement in AI in the past years, was a pretty small, incremental step when it came out. Only with a lot of further work building on top of that did it become what we see today. The problem is that science-journalists vastly outnumber scientists producing these incremental steps, so instead of reporting on topics when improvements actually accumulated to a big advancement, every step along the way gets its own article with tons of unnecessary commentary heralding its features.

astrange2y ago

The "bitter lesson of machine learning" means that you actually can't do anything novel; it won't work as well as just doing the simple thing but bigger.

(So there is room left if you're limited by memory or budget.)

w10-12y ago

> What’s new is the combination of compute capability and data volume available for training

This is the important part.

My advisor said new means old method applied to new data or new method on old data.

Commercially, that means price points, i.e., discrete points where something becomes viable.

Maybe that's iterative, but maybe not. Either way, once the opportunity presents, time is of the essence.

WiSaGaN2y ago

This gives me the vibe of calling high resolution screens as "retina" screens.

dishsoap2y ago

I don't see anything wrong with that at all. They've created a branding term that allows consumers to get an idea of the sort of pixel density they can expect without having to actually check, should they not want to bother.

2 more replies

pyinstallwoes2y ago

Still no manufacturer compares to the quality of apple screens and resolution …

1 more reply

dwaite2y ago

Or AMD calling monitors which meet quality and feature requirements 'FreeSync'

Or Intel calling USB4 devices and cables which meet quality and feature requirements 'Thunderbolt 5'

Compared to, say, manufacturers who aren't willing to meet any certification requirements or to properly implement the standards at play saying they have "USB-A 3.2 2x2 ports" on their motherboards.

Retina doesn't carry the same weight as an industry certification effort like thunderbolt, but it still informs people that a screen actually meets some sort of bar without them having to evaluate pages of tech specs, and reviews saying whether the tech specs are accurate or have undocumented caveats.

Finally, establishing such certifications are difficult - look at the number of failed attempts at creating industry quality/feature marks in the television market.

viktorcode2y ago

Retina means high pixel density, not high resolution. And there are very few standalone displays on the market which can be called “retina”, unfortunately.

1 more reply

threeseed2y ago

It's a huge development in terms of it being a consumer-ready, on-device LLM.

And if Karpathy thinks so then I assume it's good enough for HN:

https://x.com/karpathy/status/1800242310116262150

rishabhjain11982y ago

The productization of it (like Karpathy mentioned) is awesome. But I think the URL for that would be this maybe? [link](https://www.apple.com/apple-intelligence/)

Cthulhu_2y ago

Thing is, Apple takes these concepts and polishes them, makes them accessible to maybe not laypeople but definitely a much wider audience compared to those already "in the industry", so to speak.

marcellus232y ago

They refer to LoRA explicitly in the post.

rishabhjain11982y ago

Although I caught that on the first read, I found myself questioning when I read the adaptors part, "is this not just LoRA...".

Maybe it's my fault as a reader, but I think the writing could be clearer. Usually in a research paper you would link to the LoRA paper there too.

monkeydust2y ago

Feel Apple should have just focused on their models for this one and not complicate the conversation with OpenAI. They could have left that to another announcement later.

Quick straw poll survey around the office, many think their data will be sent off to OpenAI by default for these new features which is not the case.

scosman2y ago

I think you’re referring to my comment about this being huge for developers?

Just want to point out I call this launch huge, didn’t say “huge development” as quoted, and didn’t imply what was interesting was the ML research. No one in this thread used the quoted words, at least that I can see.

My comment was about dev experience, memory swapping, potential for tuning base models to each HW release, fine tune deployment, and app size. Those things do have the potential to be huge for developers, as mentioned. They are the things that will make a local+private ML developer ecosystem work.

I think the article and comment make sense in their context: a developer conference for Mac and iOS devs.

Apple also explicitly says it’s LoRA.

franzb2y ago

This isn't about AI research, it's about delivering AI at unimaginable scale.

throwthrowuknow2y ago

180 million users for chatgpt isn’t unimaginable but it does exceed the number of iPhone users in the United States.

frompom2y ago

Do you have the same expectations for any company launching hardware that they cite the various papers related to how the tech was developed? EVERY piece of tech announced by ANY company relies on a variety of research out there yet it doesn't seem expected that every time anyone launches something they cite the numerous papers related to that. Why would products/services in this category be any different?

talldayo2y ago

> Do you have the same expectations for any company launching hardware that they cite the various papers related to how the tech was developed?

If they try to market it with a seemingly unique or yet-unheard of name, then yeah. It is nice knowing what the "real world" name of an Apple-ized technology is.

Just ignoring it and marketing the technology under some new name is adjacent to lying to your audience through omission.

1 more reply

arvinsim2y ago

> The way they wrote about "adaptors" can make it seem like it's something novel, but it's actually just re-iterating vanilla LoRA.

That's a classic Apple strategy though.

gigglesupstairs2y ago

Was there anything about searching through our own photos using prompts? I thought this could be pretty amazing and still a natural way to find very specific photos in one’s own photo gallery.

mazzystar2y ago

Run OpenAI's CLIP model on iOS to search photos. https://github.com/mazzzystar/Queryable

1 more reply

avereveard2y ago

Which is in turn just multimodal embedding

Besides I could do "named person on a beach in August" and get the correct thing in photos on Android photos, so I don't get it.

It's amazing for apple users if they didn't have it before. But from a tech stand point people could have had it for a while.

2 more replies

lhl2y ago

I think your conclusion is uncharitable or at least depends on how deep your interest in AI research actually is. Reading the docs, there are at least several points of novelty/interest:

* Clearly outlining their intent/policies for training/data use. Committing to no using user data or interactions for training their base models is IMO actually a pretty big deal and a differentiator from everyone else.

* There's a never-ending stream of new RL variants ofc, but that's how technology advances, and I'm pretty interested to see how these compare with the rest: "We have developed two novel algorithms in post-training: (1) a rejection sampling fine-tuning algorithm with teacher committee, and (2) a reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator. We find that these two algorithms lead to significant improvement in the model’s instruction-following quality."

* I'm interested to see how their custom quantization compares with the current SoTA (probably AQLM atm)

* It looks like they've done some interesting optimizations to lower TTFT, this includes the use of some sort of self-speculation. It looks like they also have a new KV-cache update mechanism and looking forward to reading about that as well. 0.6ms/token means that for your average I dunno, 20 token query you might only wait 12ms for TTFT (I have my doubts, maybe they're getting their numbers from much larger prompts, again, I'm interested to see for myself)

* Yes, it looks like they're using pretty standard LoRAs, the more interesting part is their (automated) training/re-training infrastructure but I doubt that's something that will be shared. The actual training pipeline (feedback collection, refinement, automated deployment) is where the real meat and potatoes of being able to deploy AI for prod/at scale lies. Still, what they shared about their tuning procedures is still pretty interesting, as well as seeing which models they're comparing against.

As this article doesn't claim to be a technical report or a paper, while citations would be nice, I can also understand why they were elided. OpenAI has done the same (and sometimes gotten heat for it, like w/ Matroyshka embeddings). For all we know, maybe the original author had references, or maybe since PEFT isn't new to those in the field, that describing it is just being done as a service to the reader - at the end of the day, it's up to the reader to make their own judgements on what's new or not, or a huge development or not. From my reading of the article, your conclusion, which funnily enough is now the new top-rated comment on this thread isn't actually much more accurate the the one old one you're criticizing.

steve19772y ago

You know what company you are talking about here?

rvaish2y ago

reminds me of Easel on iMessage: https://easelapps.ai/

Spooky232y ago

Those people aren’t looking at Apple.

They seem to have a good model for adding value to their products without the hold my beer, conquer the world bullshit that you get from OpenAI, et al.

kfrzcode2y ago

"AI for the rest of us."

wkat42422y ago

Except Apple isn't really for the rest of us. Outside of America and a handful wealthy western countries it's for the top 5-20% earners only.

4 more replies

chuckjchen2y ago

This sounds like every newcomers to the stage except for big players like Apple.

kmeisthax2y ago· 12 in thread

> We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.

And, of course, nobody has known to opt-out by blocking AppleBot-Extended until after the announcement where they've already pirated shittons of data.

In completely unrelated news, I just trained a new OS development AI on every OS Apple has ever written. Don't worry. There's an opt-out, Apple just needed to know to put these magic words in their installer image years ago. I'm sure Apple legal will be OK with this.

multimoon2y ago

Apple just did more to make this a privacy focused feature versus just a data mine than literally anyone else to date and still people complain.

Public content on the internet is public content on the internet - I thought we had all agreed years ago that if you didn’t want your content copied, don’t make it freely available and unlicensed on the internet.

kmeisthax2y ago

Oh no, don't get me wrong. I like the privacy features, it's already way better than OpenAI's "we make it proprietary so we can spy on you" approach.

What I don't like is the hypocrisy that basically every AI company has engaged in, where copying my shit is OK but copying theirs is not. The Internet is not public domain, as much as Eric Bauman and every AI research team would say otherwise. Even if you don't like copyright[0], you should care about copyleft, because denying valuable creative work to the proprietary world is how you get them to concede. If you can shove that work into an AI and get the benefits of that knowledge without the licensing requirement, then copyleft is useless as a tactic to get the proprietary world to bend the knee.

[0] And I don't.

My opinion is that individual copyright ownership is a bad deal for most artists and we need collective negotiation instead. Even the most copyright-respecting, 'ethical' AI boils down to Adobe dropping a EULA roofie in the Adobe Stock Contributor Agreement that lets them pay you pennies.

2 more replies

meatmanek2y ago

> I thought we had all agreed years ago that if you didn’t want your content copied, don’t make it freely available and unlicensed on the internet.

Until LLMs came along, most large-scale internet scraping was for search engines. Websites benefited from this arrangement because search engines directed users to those websites.

LLMs abused this arrangement to scrape content into a local database, compress that into a language model, and then serve the content directly to the user without directing the user to the website.

It might've been legal, but that doesn't mean it was ethical.

1 more reply

notJim2y ago

I hate to tell you, but I've been training a neural network on the internet for over a decade now. Specifically the one between my ears. Unfortunately, it seems to be gradually going insane.

4 more replies

bigyikes2y ago

> just trained a new OS development AI on every OS Apple has ever written.

…is there publicly visible source code for every OS Apple has ever written?

1 more reply

doctorpangloss2y ago

There are already a lot of options for running LLMs with open weights artifacts, trained with a variety of sources. The real question isn’t which ideas they have. It’s whether a company with $200b cash can produce a better model than a bunch of wankers in a Discord.

1 more reply

Someone2y ago

> And, of course, nobody has known to opt-out by blocking AppleBot-Extended until after the announcement where they've already pirated shittons of data.

It’s not as bad as that, I think. https://support.apple.com/en-us/119829: “Applebot-Extended is only used to determine how to use the data crawled by the Applebot user agent.“

⇒ if you use robots.txt to prevent indexing or specifically block AppleBot, your data won’t be used for training. AppleBot is almost a decade old (https://searchengineland.com/apple-confirms-their-web-crawle...)

Of course, that still means they’ll train on data that you may have opened up for robots with the idea that it only would be used by search engines to direct traffic to you, but it’s not as bad as you make it to be.

scosman2y ago

There will be further versions of this model. Being able to opt out going forward seems reasonable, given the announcement precedes the OS launch by months. Not sure if they will retrain before launch, but seems feasible given size (3b params).

1 more reply

zer00eyz2y ago

> publicly available data collected

Data, implies factual information. You can not copyright factual information.

The fact that I use the word "appalling" to describe the practice of doing this results in some vector relationship between the words. Thats the data, the fact, not the writing itself.

There are going to be a bunch of interesting court cases where the court is going to have to backtrack on copyrighting facts. Or were going to have to get some real odd legal interpretations of how LLM's work (and buy into them). Or we're going to have to change the law (giving everyone else first mover advantage).

Base on how things have been working I am betting that it's the last one, because it pulls up the ladder.

1 more reply

threeseed2y ago

> And, of course, nobody has known to opt-out by blocking AppleBot-Extended until after the announcement where they've already pirated shittons of data

This is wrong. AppleBot identifier hasn't changed: https://support.apple.com/en-us/119829

There is no AppleBot-Extended. And if you blocked it in the past it remains blocked.

1 more reply

mdhb2y ago

So built on stolen data essentially.

3 more replies

__MatrixMan__2y ago

Piracy requires multiparty conspiracy against the establishment. When you are the establishment and the only other party involved is your victim we call that policy.

ksec2y ago· 11 in thread

I hope, this could mean Apple will push the baseline of ALL Macs to have higher than 8GB of Memory. While I wish we all get 16GB M4 as baseline. Apple being Apple may only give us 12GB, and charges extra $100 for the 16GB option.

It will still be a lot better than 8GB though.

talldayo2y ago

The Steam Deck ships with 16 gigs of quad-channel LPDDR5 and it costs $400. Apple knows exaaaactly what they're doing with this sort of pricing.

Can't forget about that cozy 256gb SSD either. An AI computer will need more than that, right?

wraptile2y ago

RAM is literally the cheapest primary component in a laptop at going rate of 1-4usd/GB. I'd say that shipping 8GB base model in 2024 is clearly manipulation by Apple, i.e. planned obsolescence or a way to moat Apple software. Anyone who doesn't see this is just being delusional.

Same way Apple and Samsung ship 128GB of storage when the production price between 128gb and 1tb is like 10$ (on a 1000$ device). Samsung even got rid of micro sd slot. It's so blatant it's actually depressing.

3 more replies

zer0zzz2y ago

Is steamdeck sold at cost? From what I know Apple has a rule that everything must be sold at 40% margins. That is prob the main reason.

4 more replies

torginus2y ago

I remember hearing that Apple's researching running AI models straight from flash storage (which would make immense amount of sense imo). You could create special, high read bandwidth flash chips (which would probably involve connecting a fast transciever in parallel to the 3D flash stack).

If you could do that, you could easily get hundreds of GB/s read speed out of simple TLC flash.

Obviously this is the future, but I think it's a promising one.

dgellow2y ago

The have insane margin on ram and storage, it would be really surprising to see them move away from their current strategy

keyle2y ago

It probably will change. Note that, so far, a 16GB apple device has much better usability than the equivalent on windows. This may sound biased, but the memory compression and foreground/background actions by macOS tight integration with the hardware is really good. I've never felt like I couldn't do things on smaller hardware, except (larges) LLMs.

Also when I compare with my co-workers the memory pressure is a lot less running the same software on macOS than Windows. This might have to be due to the UI framework at play.

But that said, I totally agree that Apple is doing daylight robbery with their additional RAM pricing, and the minimum on offer is laughable.

Aeolun2y ago

Any apple device has much better usability than a windows machine, regardless of RAM.

wwtrv2y ago

> This may sound biased,

It certainly does, close to irrational even. IIRC memory compression is enabled by default on Windows as well.

1 more reply

snemvalts2y ago

The swapping is indeed faster as the SSD is on the SoC and so fast to access. To the point that an 4 year old 8gb M1 Air is enough for simpler development work, at least for me.

2 more replies

vishnugupta2y ago

Does it matter what the baseline memory is as long as they have 16GB M4 as an option?

manmal2y ago

Some companies give their employees only base models.

1 more reply

scosman2y ago· 10 in thread

“We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks.”

This is huuuuge. I don’t see announcement of 3rd party training support yet, but I imagine/hope it’s planned.

One of the hard things about local+private ML is I don’t want every app I download to need GBs of weights, and don’t want a delay when I open a new app and all the memory swap happens. As an app developer I want the best model that runs on each HW model, not one lowest common denominator model for slowest HW I support. Apple has the chance to make this smooth: great models tuned to each chip, adapters for each use case, new use cases only have a few MB of weights (for a set of current base models), and base models can get better over time (new HW and improved models). Basically app thinning for models.

Even if the base models aren’t SOTA to start, the developer experience is great and they can iterate.

Server side is so much easier, but look forward to local+private taking over for a lot of use cases.

dimtion2y ago

With huge blobs of binary model weights, dynamic linking is cool again.

pjmlp2y ago

Dynamic linking has always been cool for writing plugins.

It is kind of ironic that languages that praise so much for going back to early linking models, have to resort for much heavier OS IPC for similar capabilities.

1 more reply

inickt2y ago

Which Apple has put some pretty large effort in the last few years to improve in iOS

lossolo2y ago

It's LORA, most of the things you saw in Apple Intelligence on device presentation are basically different LORAs.

scosman2y ago

The article says it’s lora a bunch of times. That’s clear.

My comment above is about dev experience, memory swapping, tuning base models to each HW release, and app size.

eightysixfour2y ago

This is how Google is doing it too.

gokuldas0110112y ago

Indeed. Google said LoRA and apple said adapter plugging. Wonder the difference is at, Apple's dev conference is for consumers and Google's dev conference is for developers.

1 more reply

scosman2y ago

Oh missed that!

But kinda as expected: only works on 2 android phones (pixel 8 pro, S24).

Pretty typical: Apple isn’t first, but also typically will scale faster with HW+platform integration.

2 more replies

danielmarkbruce2y ago

this is pretty stock standard lora.

1 more reply

seydor2y ago

Local models are also extremely energy consuming. I don't see local AI working for long, because Large models are going to get so incomparably smarter and eventually reach general intelligence

cube22222y ago· 9 in thread

Halfway down the article contains some great charts with comparisons to other relevant models, like Mistral-7B for the on-device models, and both gpt-3.5 and 4 for the server-side models.

They include data about the ratio of which outputs human graders preferred (for server side it’s better than 3.5, worse than 4).

BUT, the interesting chart to me is „Human Evaluation of Output Harmfulness” which is much, much ”better„ than the other models. Both on-device and server-side.

I wonder if that’s part of wanting to have gpt as the „level 3”. Making their own models much more cautious, and using OpenAI’s models in a way that makes it clear „it was ChatGPT that said this, not us”.

Instruction following accuracy seems to be really good as well.

crooked-v2y ago

I want to know what they consider "harmful". Is it going to refuse to operate for sex workers, murder mystery writers, or people who use knives?

hotdogscout2y ago

I bet it's the usual double standards the AI one percenters cater to.

No sex because apparently it's harmful yet never explained why.

No homophobia/transphobia if you're Christian but if you're Muslim it's fine.

m4632y ago

Bet it depends on the country.

In the USA, you won't be able to ask about sex, but you can probably ask about tank man.

1 more reply

arthur_sav2y ago

They'll inject whatever ideology / dogma is "the current thing" into this.

2 more replies

its_ethan2y ago

The caption for the image gives a little more insight into "harmful" and one of the things it mentions is factuality - which is interesting, but doesn't reveal a whole lot unless they were to break it out by "type of harmful".

Aerbil3132y ago

None of the use cases they presented in WWDC using Apple Intelligence was creative writing. There is one, that uses ChatGPT explicitly:

> And with Compose in Writing Tools, you can create and illustrate original content from scratch.

https://www.apple.com/apple-intelligence/

causal2y ago

Refusing to answer any question would result in a perfect score for the first chart since it says nothing of specificity

tonynator2y ago

So it's not going to be better than other models, but it will be more censored. I guess that might be a selling point for their customer base?

dghlsakjg2y ago

iPhone share is ~59% of smartphones in the US.

Their customer base is effectively all demographics.

1 more reply

TheRoque2y ago· 8 in thread

Why isn't there a comparison with the Llama3 8b in the "benchmarks" ?

axoltl2y ago

The Llama 3 license says:

"If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights."

IANAL but my read of this is that Apple's not allowed to use Llama 3 at all, for any purposes, including comparisons.

anvuong2y ago

They can just run the same tests and cite the results from other websites. That has nothing to do with Meta. No companies can force you to not talk about them.

1 more reply

teonimesic22y ago

I believe it is because llama 3 8B beats it, which would make it look bad. The phi-3-mini version they used is the 4k which is 3.8B, while LLama 3 8B would be more comparable to phi-3 small (7B) which also considerably better than phi-3-mini. Likely both phi-3 small and llama 3 8B had too good results in comparison to Apple's to be added, since they did add other 7B models for comparison, but only when they won.

mixtureoftakes2y ago

llama 3 definitely beats it, but 99% of the users wont care which is actually a good thing... apple totally wins the ai market not by being sota but by sheer amount of devices which will be running their models, we're talking billions

1 more reply

woadwarrior012y ago

Because their model won't look good in comparison. Also see this part of the footnote: "The open-source and Apple models are evaluated in bfloat16 precision." The end user's on-device experience will be with a quantized model and not the bfloat16 model.

leodriesch2y ago

I think it’s fair to leave it out in the on-device model comparison. 3b is much smaller than 8b, it is obviously not going to be as good as llama 3 if they did not make groundbreaking advancements with the technology.

hmottestad2y ago

Maybe it’s too new for them to have had time to include it in their studies?

TheRoque2y ago

Phi-3-Mini, which is in the benchmarks, was released after Llama3 8b

1 more reply

vzaliva2y ago· 7 in thread

I love that they use machinelearning.apple.com not ai.apple.com

tmpz222y ago

For the majority of the keynote they explicitly avoided the word AI instead substituting the word Intelligence, then Apple Intelligence, and then towards the end they said AI and ChatGPT once or twice.

I think they saw the response to all the AI shoveling and Microsoft Recall and executed a fantastic strategy to reposition themselves in industry discussions. I still have tons of reservations about privacy and what this will all look like in a few years, but you really have to take your hat off to them. WWDC has been awesome and it makes me excited to develop for their platform in a way I haven't felt in a very, very, long time.

worstspotgain2y ago

> executed a fantastic strategy to reposition themselves in industry discussions

Just the usual marketing angle, IMO. It's not TV, it's HBO.

No one is reluctant to use the word smartphone to include iPhones. I don't think anyone is going to use the Apple Intelligence moniker except in the same cases where they'd say iCloud instead of cloud services.

It's also a little clunky. Maybe they could have gone with... xI? Too close to the Chinese Xi. iAI? Sounds like the Spanish "ay ay ay." Not an easy one I think. The number of person-hours spent on this must have been something.

2 more replies

seydor2y ago

> makes me excited to develop for their platform in a way I haven't felt in a very, very, long time

AI will ultimately do all the 'development', and will replace all apps. The integrations are going to be a temporary measure. Only apps that will survive are the ones that control things that apple cannot control (ie. how Uber controls its fleet)

1 more reply

dgellow2y ago

What excites you specifically as a developer?

1 more reply

andbberger2y ago

glad someone sane is in charge in cupertino

okdood642y ago

Apple Intelligence.

1 more reply

xwolfi2y ago

Yeah they probably were still working on the last buzzword

ndgold2y ago· 6 in thread

Absolutely awesome amount of content in these two pages. This was not expected. It is appreciated. I can’t wait to use the server model on a Mac to spin up my own cloud optimized for the Apple stack.

solarkraft2y ago

What makes you think you'll get that model?

Edit: I see they're committing to publishing the OS images running on their inference servers (https://security.apple.com/blog/private-cloud-compute/). Would be cool if that allowed people to run their own.

msephton2y ago

Apparently they will in a VM but it seems perhaps only security researchers?

whazor2y ago

It would be much cooler if enterprises can swap to their custom models in their own clouds.

rekoil2y ago

> Would be cool if that allowed people to run their own.

Oh my god that would be absolutely amazing!

titaniumtown2y ago

Did it mentioned being able to spin up the server model locally? I must've missed that part in the article.

theshrike792y ago

They didn't but I'll bet it's coming in the next 5 years.

Most likely integrated with an Apple TV or a similar thing. Enough local LLM processing power to handle a family's data all in-house.

miven2y ago· 6 in thread

> For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements.

Did they go over the entire text with a thesaurus? I've never seen "palletization" be used as a viable synonym for "quantization" before, and I've read quite a few papers on LLM quantization

bagrow2y ago

https://apple.github.io/coremltools/docs-guides/source/palet...

miven2y ago

Huh, generally whenever I saw the lookup table approach in literature it was also referred to as quantization, guess they wanted to disambiguate the two methods

Though I'm not sure how warranted it really is, in both cases it's still pretty much the same idea of reducing the precision, just with different implementations

Edit: they even refer to it as LUT quantization on another page: https://apple.github.io/coremltools/docs-guides/source/quant...

1 more reply

fudged712y ago

404

2 more replies

elcritch2y ago

Huh, it’s PNG for AI weights.

cgearhart2y ago

I also found it confusing the first time I saw it. I believe it is sometimes used because the techniques for DL are very similar (in some cases identical) to algorithms that were developed for color palette quantization (in some places shortened to "palettization"). [1] At this point my understanding is that this term is used to be more specific about the type of quantization being performed.

https://en.wikipedia.org/wiki/Color_quantization

dialup_sounds2y ago

I enjoy the plausible irony that they used the very same model they're describing to proofread the article, and it didn't catch palettize (like a color palette) vs. palletize (like a shipping pallet).

epipolar2y ago· 6 in thread

It would be interesting to see how these models impact battery life. I’ve tried a few local LLMs on my iPhone 15 Pro via the PrivateLLM app, and the battery charge plummets just after a few minutes of usage.

bradly2y ago

During my time at Apple the bigger issue with personalized, on-device models was the file size. At the time, each model was a significant amount of data to push to a device, and with lots of teams wanting an on-device model and the desire to update them regularly, it was definitely a big discussion.

hmottestad2y ago

They’ve gone with a single 3B model and several “adapters” for each use case. One adapter is good at summarising while another good a generating message replies.

1 more reply

woadwarrior012y ago

I'm the author of Private LLM. Looks like it's just become possible[1] to run quantized LLM inference using the ANE with iOS 18. I think there are some major efficiency gains on the table now.

[1]: https://github.com/apple/coremltools/pull/2232

urbandw311er2y ago

Likely they’ll be able to take advantage of the hardware neural engine and be far more power efficient. Apple has demonstrated this is something it takes pretty seriously.

brcmthrowaway2y ago

So iOS LLM Apps dont use the neural engine? Lol

4 more replies

jamesy0ung2y ago

It looks like PrivateLLM uses the GPU for inferencing, from what I can tell, Apple is using the ANE on the A17 Pro. For M1 and above, I'd presume they are using the GPU since the ANE in M series isn't great.

orbital-decay2y ago· 5 in thread

> 2. Represent our users: We build deeply personal products with the goal of representing users around the globe authentically. We work continuously to avoid perpetuating stereotypes and systemic biases across our AI tools and models.

How do they represent users around the globe authentically while being located in Cupertino, CA? (more of a rhetorical question really)

esskay2y ago

You mean the person on the other side of the planet doesn't know about Philz Coffee down on Stevens Creek Blvd, or that there's a cool park a 2 minute walk away from Apple HQ?!

It does baffle me how California centric they are with many of their announcements, and even some features.

rekoil2y ago

The Maps stuff always gets me. Yeah sure it looks pretty, but almost none of what makes it a usable product is available to me in Sweden.

boxed2y ago

I wish I could have one keyboard on my iPhone and could type both Swedish and English with it. These are the basics they can't get right, and I don't see why. They clearly have bilingual people working over there, why is this so bad?

dgellow2y ago

I share your pain, switching between English, German, and French is really, really frustrating…

jacooper2y ago

Because then you will be typing wrong/s

wslh2y ago· 5 in thread

Is it me or Apple is really moving fast? I don't think it is easy for a company of this size to concisely put a vision of AI in these short and crazy AI times.

BTW, not an Apple fan but an Apple user.

MacsHeadroom2y ago

Google had similar AI functionality on Pixels last year and Microsoft had like six AI CoPilot products before that. So I would not say Apple is moving fast.

Most people expected this update 6 months ago.

azinman22y ago

Take a look at the yearly OS cadence. iOS 17 only came out a few months before your 6 month expectation.

doctor_eval2y ago

Since when does Apple make major software update announcements at Christmas?

wmf2y ago

People thought Apple was behind but they were just working quietly.

majestik2y ago

ChatGPT came out November 2022 and it took Apple 18 months to announce Siri will integrate with it.

Is that moving fast? Maybe, compared to what, Oracle?

ddxv2y ago· 5 in thread

Will these smaller on device models lead to a crash in GPU prices?

jondwillis2y ago

Not in the short-to-medium-term. Try the local models out, they fall over pretty quickly, even if you have 64GB+ of VRAM.

wkat42422y ago

It depends what you use them for.

If you ask it for knowledge, like a comparison of vacuum cleaner models then yes, it's a hallucination fest. They just don't have the parameters for this level of detail. This is where ChatGPT is really king.

But if you give them the data they need with RAG, they're not bad. Acting on commands, looking stuff up in provided context, summarising all perform pretty well. Which seems to be also what Apple is targeting to do with them.

sooheon2y ago

Prices fall when supply outpaces demand -- this is adding more demand.

wmf2y ago

This isn't adding GPU demand.

1 more reply

htrp2y ago

X to doubt.

ra72y ago· 4 in thread

> Our foundation models are trained on Apple's AXLearn framework, an open-source project we released in 2023. It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs.

Interesting that they’re using TPUs for training, in addition to GPUs. Is it both a technical decision (JAX and XLA) and a hedge against Nvidia?

m-s-y2y ago

They’d be silly not to hedge. Anyone, in fact, would be silly. It to hedge. On pretty much everything.

anvuong2y ago

Jax was built with TPUs in mind, so it's not surprising that they use TPUs

gokuldas0110112y ago

"Use the best tool available"

flakiness2y ago

They hired people nearby. Conveniently there is a small town called Mountain View.

htrp2y ago· 4 in thread

> Our foundation models are fine-tuned for users’ everyday activities, and can dynamically specialize themselves on-the-fly for the task at hand. We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks. For our models we adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.

>We represent the values of the adapter parameters using 16 bits, and for the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.

This kind of sounds like Loras......

cube22222y ago

The article explicitly states they’re Loras.

karmasimida2y ago

I think it is just LoRA, you can call the LoRA weights as adapters

alephxyz2y ago

The A in LoRA stands for adapters

GaggiX2y ago

LoRA stands for "Low Rank Adaptation" btw.

advael2y ago· 4 in thread

I'm disappointed that they make the fundamental claim that their cloud service is private with respect to user inputs passed through it and don't even a little bit talk about how that's accomplished. Even just an explanation of what guarantees they make and how would be much more interesting than explanations of their flavor of RLHF or whatever nonsense. I read the GAZELLE* paper when it came out and wondered what it would look like if a large-scale organization tried to deploy something like it.

Of course, Apple will never give adequate details about security mechanisms or privacy guarantees. They are in the business of selling you security as something that must be handled by them and them alone, and that knowing how they do it would somehow be less secure (This is the opposite of how it actually works, but also Apple loves doublespeak, and 1984 allusions have been their brand since at least 1984). I view that, like any claim by a tech company that they are keeping your data secure in any context, as security theater. Vague promises are no promises at all. Put up or shut up.

* https://arxiv.org/pdf/1801.05507

killingtime742y ago

Don't they do it in this linked article? https://security.apple.com/blog/private-cloud-compute/

senderista2y ago

This approach is definitely not "secure by construction" like FHE, it's just defense-in-depth with a whole lot of impressive-sounding layers. But I don't see how this has anything to do with provable security (not that TFA claims it does).

advael2y ago

Woa, good catch! Maybe they're doing better about at least being concrete about it, though I still have to side-eye "Users control their devices" (Even with root on macbooks I don't have access to everything running on it). However, the section that promises to open-source the cloud software are impressive and if true gives them more credibility than I assumed. I would still look out for places where devices they do control could pass them keys in still-proprietary parts of the stack they're operating, as even if we can verify the cloud container OS in its entirety if there's a backchannel for keys that a hypervisor could use then that's still a backdoor, but they are at least seemingly making a real effort here

1 more reply

KoolKat232y ago

The only two questions I would have would be, how often are they "periodically rebooted" and what are the predefined metrics logged/reported.

We may have some insight into the second point when the code is published.

GaggiX2y ago· 3 in thread

It would be cool to understand when the system will use one or the other (the ~3 billion on-device model or the bigger one on Apple servers).

swatcoder2y ago

Conceivably, they don't have precise answers for that yet, and won't until after they see what real-world usage looks like.

They built out a system that's ready to scale to deliver features that may not work on available hardware, but they're also incentivized to minimize actual reliance on that cloud stuff as it incurs per-use costs that local runs don't.

GaggiX2y ago

Yeah this is probably right. If it works well enough during real-world usage it will be using the on-device model, if not then there is the bigger one on the servers. There is also GPT-4o, so they have 3 different models to use depending on the task.

aixpert2y ago

if you have ever used a 3 billion or 7 billion parameter model you know that they are really bad at text generation, so this will be done in the cloud

shreezus2y ago· 3 in thread

This is great, however Apple needs to be explicit on what it, and what isn't relayed to third party services, and provide the ability to opt-out if desired. It's one thing to run inference on-device, and another to send your data through OpenAI's APIs. The partnership details are not entirely clear to me as a user.

frizlab2y ago

They are? Did you watch the keynote? They talked about it at length.

1 more reply

TillE2y ago

It's literally the prompt you just gave it, that's what they're sending to ChatGPT, nothing else. None of the features that sift through your data are touching OpenAI.

gnicholas2y ago

My understanding is that nothing is shared with any non-Apple company except if you specifically authorize it on a per-use basis. Otherwise it just runs locally or in the Apple AI cloud, and is not retained. All of this is subject to verification of Apple’s claims, of course.

w10-12y ago· 2 in thread

I think we as tech people lost the forest for the trees.

Apple (unwisely I think) is allowing UI's to just generate responses.

The wow-neat! experience will wear off quickly. Then even as a miss rate of 0.1%, there will be thousands - millions - of cringe-worthy examples that sully the Apple brand for quality.

It will be impossible to create quality filter good enough, and there will be no way to back these features out of the OS.

For targeted use-cases (like coding and editing), this will be useful. But these features may be what finally makes contempt for Apple go mainstream, and that would be a shame.

Internally at Apple, they likely discussed how much to limit the rollout and control usage. I think they decided to bake it into API's more to maintain developer mindshare than to keep users happy.

The one feature that could flip that script is interacting with Siri/AI in order to get things done. The frustration with knowing what you want but not how or whether it can be done drives a lot of tech angst. If this only meant ordinary people could use their existing phones to their full extent, it would be a huge win.

s3p2y ago

"that sully the Apple brand for quality."

OK. No one remembers Apple Maps, the CSAM scanning, the crush ad, etc? Companies do embarrassing stuff all the time. At least they're trying.

scottyah2y ago

I agree, they're joining in on the slippery slope auto-correct, home assistants, and Self Driving.

I think it's been awhile since consumers have trusted or relied on consumer tech. Browsing the web from a phone can only be described as adversarial. Scrolling down a top google result recipe site is almost impossible. Texts don't always send and you can't keep up with all the cloud backup offerings that it's hard to tell if your photos are actually being saved.

The current political and media scene is often described as post-truth, where accuracy isn't the biggest driving factor. It seems that computation is headed that way as well.

dingclancy2y ago· 2 in thread

It’s interesting that a sub-ChatGPT 3.5 class model can do a lot of things on-device if you marry it with a good platform and feed it personal context. GPT-4o, living on the browser, is not as compelling as a product compared to what Apple Intelligence can do on the iPhone with a less capable model.

aixpert2y ago

their 3 billion parameter model can't do shit, Only some basic grammar check style rewrite and maybe summarization

pertymcpert2y ago

Have you tried it much?

Isuckatcode2y ago· 2 in thread

>By fine-tuning only the adapter layers, the original parameters of the base pre-trained model remain unchanged, preserving the general knowledge of the model while tailoring the adapter layers to support specific tasks.

From a ML noob (me) understanding of this, does this mean that the final matrix is regularly fine tuned instead of fine tuning the main model ? Is this similar to how chatGPT now remembers memory[1] ?

[1] https://help.openai.com/en/articles/8590148-memory-faq

ww5202y ago

The base model is frozen. The smaller adaptor matrices which are finetuned with new data. During inference, the weights from the adaptor matrices "shadow" the weights in the base model. Since the adaptor matrices are much smaller, it's quite efficient to finetune them.

The advantage of the adaptor matrices is you can have different sets of adaptor matrices for different tasks, all based of the base model.

MacsHeadroom2y ago

ChatGPT memory is just a database with everything you told it to remember.

Low Rank Adaptors (LoRA) are a way of changing the function of a model by only having to load a delta for a tiny percentage of the weights rather than all the weights for an entirely new model.

No fine-tuning is going to happen on Apple computers or phones at any point. They are just swapping out Apple's pre-made LoRAs so that they can store one LLM and dozens of LoRAs in a fraction of the space it would take to store dozens of LLMs.

simianparrot2y ago· 2 in thread

I just hope all of this can be toggled off, I don't want it on my devices.

dmix2y ago

They said repeatedly in the video anything going over the wire is optional and user controllable.

simianparrot2y ago

I don't want local AI either. These "smart" features are all noise to me.

buildbot2y ago· 1 in thread

3.5B per weight with no quality loss is state of the art - that's an awesome optimization result (a mix of 2b and 4b weights).

Hugsun2y ago

I would like to see their method compared quantitatively to the best llama.cpp methods. IQ3_S has a similar bpw and pretty high quality.

I wonder if they didn't stretch the truth using the phrase "without loss in accuracy".

Blackstrat2y ago· 1 in thread

I haven't seen anything indicating whether these features can be disabled. I'm not interested in adding a further invasion of privacy to my phone. I don't want some elaborate parlor trick helping me write. I've spent some time with ChatGPT and while it was somewhat novel, I wasn't overly impressed. Much of it was rudimentary and often wrong. And I wasn't overly impressed with some of the code that it generated. Reliance on such tools reminds me of an Asimov SF tale.

hbn2y ago

I'd certainly expect you'd to be able to at the very least disable the stuff that does outgoing network requests.

As for the stuff that's local to your device, how is your privacy being invaded? It's your device's OS looking at data on the device it's running on, as it's always done.

PHGamer2y ago· 1 in thread

it would have been nice if they allowed you to build your own apple AI system (i refused to redefine apples AI as just AI :-p ) using clusters of mac minis and mac pros. but of course they still want that data for themselves like google does. its secure against everyone but apple and the NSA probably lol.

IOT_Apprentice2y ago

What is stopping you from doing that? Nothing. Start cooking

mFixman2y ago· 1 in thread

Has anybody here improved their day-to-day workflow with any kind of "implicit" generative AI rather than explicitly talking to an LLM?

So far all attempts seem to be building an universal Clippy. In my experience, all kinds of forced autocomplete and other suggestions have been worse than useless.

mavamaarten2y ago

GitHub Copilot works well in my experience. It does bad suggestions at times, but also really spot-on ones.

Other than that, AI for me is meme/image generation and a semi-useful chatbot.

hehdhdjehehegwv2y ago· 1 in thread

The WWDC show got on my nerves with the corpspeak, but this is pretty cool stuff.

I’ve been trying to make smaller more efficient models in my own work. I hope Apple publish some actual papers.

gepardi2y ago

Yeah it was close to “infomercial” levels of cheesy.

revscat2y ago· 1 in thread

> With this set of optimizations, on iPhone 15 Pro we are able to reach time-to-first-token latency of about 0.6 millisecond per prompt token, and a generation rate of 30 tokens per second. Notably, this performance is attained before employing token speculation techniques, from which we see further enhancement on the token generation rate.

This seems impressive. Is it, really? I don’t know enough about the subject to judge.

bastawhiz2y ago

For a phone running locally, that's pretty fast. The bigger question is how good the output is. Fast garbage isn't useful, so we'll have to wait to see what it actually ends up looking like outside of demos.

ofou2y ago· 1 in thread

Quite interesting this was released right after multiple rants from Elon sparked debates on X.

"If Apple integrates OpenAI at the OS level, then Apple devices will be banned at my companies. That is an unacceptable security violation."

Replying to Tim Cook: "Don’t want it. Either stop this creepy spyware or all Apple devices will be banned from the premises of my companies."

"It’s patently absurd that Apple isn’t smart enough to make their own AI, yet is somehow capable of ensuring that OpenAI will protect your security & privacy!

Apple has no clue what’s actually going on once they hand your data over to OpenAI. They’re selling you down the river."

https://x.com/elonmusk/status/1800269249912381773 https://x.com/elonmusk/status/1800266437677768765 https://x.com/elonmusk/status/1800265431078551973

kanwisher2y ago

Apple made their own AI models, only in certain cases it will ask if you want to send it to OpenAI. Presumably other ai companies later can integrate into API. But this is very privacy safe, if you use an iPhone already it already indexes all your photos and OCRs them for easy searching, on device ...

Jayakumark2y ago

The model is not opensource. Also now we are stuck with walled garden for models thats deeply integrated at OS or Browser level. 1. Apple Models not open - so we cannot run Android, also not on Desktop Chrome or Edge. 2. Microsoft Phi3 - Can run inside iOS ,but on Android only as an APP but not on OS level or no supported APIs. Can run on Desktop Edge not chrome. 3. Google GEmini nano - Can only run inside Android and Desktop Chrome not Edge, not on iOS as weights are not open.

So we cannot get a similar answer from LLM as its different models, you cannot across ecosystem.

anshumankmr2y ago

As someone who has been dabbling with Prompt Engineering and now fine tuning some models (working on a use case where we may have to fine tune one of the Mistral's 7B instruct models), I want to know what kind of skillsets I need to really have so that I can join this team (or a similar team building these sort of things)

superkuh2y ago

The "Human Evaluation of Output Harmfulness" section confirms what I've perceived: Mistral-7B is the best of the small models in terms of minimizing false positive refusals. With the refusal vector abliteration stuff this is less of an issue but a good base is still important.

rvaish2y ago

Easel on iMessage has had this experience plus more for a while, including multiplayer, where you can have two people in one scene together with photorealistic imagery: https://apps.apple.com/us/app/easel-ai/id6448734086

Hugsun2y ago

The benchmarks are very interesting. Unfortunately, the writing benchmarks seem to be poorly constructed. It looks like there are tasks no model can achieve and others that almost all models pass, i.e. every model gets around 9.0.

dharma12y ago

do they mention how big the models are? Last I saw was 3gb - I just bought a 8gb m4 iPad and keep thinking I should have gone for the 16gb one

visarga2y ago

They use synthetic data in pretraining and teacher models in RLHF, that means they use models trained on copyrighted data to make derivative models, is that sitting ok with copyright owners?

koolala2y ago

aiPhone

deldelaney2y ago

I need to resurrect by tiny old Motorola Flip Phone without internet connection. Maybe a phone should be just a phone. I don't need AI in my pants.

j / k navigate · click thread line to collapse

535 comments

206 comments · 38 top-level

rishabhjain11982y ago· 45 in thread

For people interested in AI research, there's nothing new here.

Benchmarks are nice though.

lolinder2y ago

> For people interested in AI research, there's nothing new here.

Was anyone expecting anything new?

csvm2y ago

3 more replies

IOT_Apprentice2y ago

Apple was first with 64 bit iPhone chips. Remember Qualcomm VP at the time claimed it was nothing. Apple Silicon for M1 was impressive for instant in low power high performance.

2 more replies

jeanlucas2y ago

> For people interested in AI research

I think he is pointing out for people interested in research.

OTOH, it is interesting to see how a company is applying AI to customers at the end. It will bring up new challenges that will be interesting from at least an engineering point of view.

rmbyrro2y ago

2 more replies

caseyy2y ago

> Apple has never been big on living at the cutting edge of technology

Apple:

- Final Cut Pro

- 1998: iMac

- 1999: iBook G3 (father of all MacBooks)

- 2000: Power Mac G4 Cube (the early grandparent of the Mac Mini form factor), Mac OS X

- 2001: iPod, iTunes

- 2002: Xserve (rackable servers)

- 2003: Iterative products only

- 2004: iWork Suite, Garage Band

- 2005: iPod Nano, Mac mini

- 2006: Intel Macs, Boot Camp

- 2007: iPhone and Apple TV

- 2008: MacBook Air, iPhone 3G

- 2009: iPhone 3Gs, all-in-one iMac

- 2010: iPad, iPhone 4

- 2011: Final Cut Pro X

- 2012: Retina displays, iBooks Author

- 2013: iWork for iCloud

- 2014: Swift

- 2015: Apple Watch, Apple Music

- 2016: Iterative products only

- 2017: Iterative products mainly, plus ARKit

- 2018: Iterative products only

- 2019: Apple TV +, Apple Arcade

- 2020: M1

- 2021: Iterative products only

- 2022: Iterative products only

- 2023: Apple Vision Pro

Google:

- 1998: Google Search

- 2000: AdWords (this is where it all started going wrong, lol)

- 2001: Google Images Search

- 2002: Google News

- 2003: Google AdSense

- 2004: Gmail, Google Books, Google Scholar

- 2005: Google Maps, Google Earth, Google Talk, Google Reader

- 2006: Google Calendar, Google Docs, Google Sheets, YouTube bought this year

- 2007: Street View, G Suite

- 2008: Google Chrome, Android 1.0

- 2009: Google Voice, Google Wave (early Docs if I recall correctly)

- 2010: Google Nexus One, Google TV

- 2012: Google Drive

- 2013: Chromecast

- 2014: Android Wear, Android Auto, Google Cardboard, Nexus 6, Google Fit

- 2015: Google Photos

- 2016: Google Assistant, Google Home

- 2017: Mainly iterative products only, Google Lens announced but it never rolled out really

- 2018: Iterative products only

- 2019: Iterative products only

- 2020: Iterative products only, and some rebrands (Talk->Chat, etc)

- 2021: Iterative products only, and Tensor Chip

- 2022: Iterative products only

- 2023: Iterative products only, and Bard (half-baked).

4 more replies

throwaway4good2y ago

I thought the news of them using Apple Silicon rather than NVIDIA in their data centers was significant.

Perhaps there is still hope of a relaunch of xserve; with the widespread use of Apple computers amongst developers Apple has a real chance of challenging NVIDIA's CUDA moat.

pjmlp2y ago

Not at Apple's price points.

2 more replies

grecy2y ago

I'm wondering how my electricity they will save just from moving from Intel to Apple Silicon in their data centers.

derefr2y ago

I think the thing they're saying that's novel, isn't what they have (LoRAs), but where and when and how they make them.

---

I'm not as sure, from what they've said here, whether they're also implying that these models are being trained in the background on-device.

But I'm guessing they've chosen not to do this — as, even if it were practical, it would mean that any cloud-offloaded queries wouldn't have access to these models.

Hugsun2y ago

> ...each LoRA gets fine-tuned per user...

wmf2y ago

I don't think the LoRAs are fine-tuned locally at all. It sounds like they use RAG to access data.

1 more reply

throwthrowuknow2y ago

I think you’re misunderstanding what they mean by adapting to use cases. See this passage:

> The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand

rvaish2y ago

Easel has been on iMessage for a bit now: https://apps.apple.com/us/app/easel-ai/id6448734086

avidphantasm2y ago

sigmoid102y ago

astrange2y ago

The "bitter lesson of machine learning" means that you actually can't do anything novel; it won't work as well as just doing the simple thing but bigger.

(So there is room left if you're limited by memory or budget.)

w10-12y ago

> What’s new is the combination of compute capability and data volume available for training

This is the important part.

My advisor said new means old method applied to new data or new method on old data.

Commercially, that means price points, i.e., discrete points where something becomes viable.

Maybe that's iterative, but maybe not. Either way, once the opportunity presents, time is of the essence.

WiSaGaN2y ago

This gives me the vibe of calling high resolution screens as "retina" screens.

dishsoap2y ago

2 more replies

pyinstallwoes2y ago

Still no manufacturer compares to the quality of apple screens and resolution …

1 more reply

dwaite2y ago

Or AMD calling monitors which meet quality and feature requirements 'FreeSync'

Or Intel calling USB4 devices and cables which meet quality and feature requirements 'Thunderbolt 5'

Compared to, say, manufacturers who aren't willing to meet any certification requirements or to properly implement the standards at play saying they have "USB-A 3.2 2x2 ports" on their motherboards.

Finally, establishing such certifications are difficult - look at the number of failed attempts at creating industry quality/feature marks in the television market.

viktorcode2y ago

Retina means high pixel density, not high resolution. And there are very few standalone displays on the market which can be called “retina”, unfortunately.

1 more reply

threeseed2y ago

It's a huge development in terms of it being a consumer-ready, on-device LLM.

And if Karpathy thinks so then I assume it's good enough for HN:

https://x.com/karpathy/status/1800242310116262150

rishabhjain11982y ago

The productization of it (like Karpathy mentioned) is awesome. But I think the URL for that would be this maybe? [link](https://www.apple.com/apple-intelligence/)

Cthulhu_2y ago

Thing is, Apple takes these concepts and polishes them, makes them accessible to maybe not laypeople but definitely a much wider audience compared to those already "in the industry", so to speak.

marcellus232y ago

They refer to LoRA explicitly in the post.

rishabhjain11982y ago

Although I caught that on the first read, I found myself questioning when I read the adaptors part, "is this not just LoRA...".

Maybe it's my fault as a reader, but I think the writing could be clearer. Usually in a research paper you would link to the LoRA paper there too.

monkeydust2y ago

Feel Apple should have just focused on their models for this one and not complicate the conversation with OpenAI. They could have left that to another announcement later.

Quick straw poll survey around the office, many think their data will be sent off to OpenAI by default for these new features which is not the case.

scosman2y ago

I think you’re referring to my comment about this being huge for developers?

I think the article and comment make sense in their context: a developer conference for Mac and iOS devs.

Apple also explicitly says it’s LoRA.

franzb2y ago

This isn't about AI research, it's about delivering AI at unimaginable scale.

throwthrowuknow2y ago

180 million users for chatgpt isn’t unimaginable but it does exceed the number of iPhone users in the United States.

frompom2y ago

talldayo2y ago

> Do you have the same expectations for any company launching hardware that they cite the various papers related to how the tech was developed?

If they try to market it with a seemingly unique or yet-unheard of name, then yeah. It is nice knowing what the "real world" name of an Apple-ized technology is.

Just ignoring it and marketing the technology under some new name is adjacent to lying to your audience through omission.

1 more reply

arvinsim2y ago

> The way they wrote about "adaptors" can make it seem like it's something novel, but it's actually just re-iterating vanilla LoRA.

That's a classic Apple strategy though.

gigglesupstairs2y ago

Was there anything about searching through our own photos using prompts? I thought this could be pretty amazing and still a natural way to find very specific photos in one’s own photo gallery.

mazzystar2y ago

Run OpenAI's CLIP model on iOS to search photos. https://github.com/mazzzystar/Queryable

1 more reply

avereveard2y ago

Which is in turn just multimodal embedding

Besides I could do "named person on a beach in August" and get the correct thing in photos on Android photos, so I don't get it.

It's amazing for apple users if they didn't have it before. But from a tech stand point people could have had it for a while.

2 more replies

lhl2y ago

I think your conclusion is uncharitable or at least depends on how deep your interest in AI research actually is. Reading the docs, there are at least several points of novelty/interest:

* I'm interested to see how their custom quantization compares with the current SoTA (probably AQLM atm)

steve19772y ago

You know what company you are talking about here?

rvaish2y ago

reminds me of Easel on iMessage: https://easelapps.ai/

Spooky232y ago

Those people aren’t looking at Apple.

They seem to have a good model for adding value to their products without the hold my beer, conquer the world bullshit that you get from OpenAI, et al.

kfrzcode2y ago

"AI for the rest of us."

wkat42422y ago

Except Apple isn't really for the rest of us. Outside of America and a handful wealthy western countries it's for the top 5-20% earners only.

4 more replies

chuckjchen2y ago

This sounds like every newcomers to the stage except for big players like Apple.

kmeisthax2y ago· 12 in thread

And, of course, nobody has known to opt-out by blocking AppleBot-Extended until after the announcement where they've already pirated shittons of data.

multimoon2y ago

Apple just did more to make this a privacy focused feature versus just a data mine than literally anyone else to date and still people complain.

kmeisthax2y ago

Oh no, don't get me wrong. I like the privacy features, it's already way better than OpenAI's "we make it proprietary so we can spy on you" approach.

[0] And I don't.

2 more replies

meatmanek2y ago

> I thought we had all agreed years ago that if you didn’t want your content copied, don’t make it freely available and unlicensed on the internet.

Until LLMs came along, most large-scale internet scraping was for search engines. Websites benefited from this arrangement because search engines directed users to those websites.

LLMs abused this arrangement to scrape content into a local database, compress that into a language model, and then serve the content directly to the user without directing the user to the website.

It might've been legal, but that doesn't mean it was ethical.

1 more reply

notJim2y ago

I hate to tell you, but I've been training a neural network on the internet for over a decade now. Specifically the one between my ears. Unfortunately, it seems to be gradually going insane.

4 more replies

bigyikes2y ago

> just trained a new OS development AI on every OS Apple has ever written.

…is there publicly visible source code for every OS Apple has ever written?

1 more reply

doctorpangloss2y ago

1 more reply

Someone2y ago

> And, of course, nobody has known to opt-out by blocking AppleBot-Extended until after the announcement where they've already pirated shittons of data.

It’s not as bad as that, I think. https://support.apple.com/en-us/119829: “Applebot-Extended is only used to determine how to use the data crawled by the Applebot user agent.“

scosman2y ago

1 more reply

zer00eyz2y ago

> publicly available data collected

Data, implies factual information. You can not copyright factual information.

The fact that I use the word "appalling" to describe the practice of doing this results in some vector relationship between the words. Thats the data, the fact, not the writing itself.

Base on how things have been working I am betting that it's the last one, because it pulls up the ladder.

1 more reply

threeseed2y ago

> And, of course, nobody has known to opt-out by blocking AppleBot-Extended until after the announcement where they've already pirated shittons of data

This is wrong. AppleBot identifier hasn't changed: https://support.apple.com/en-us/119829

There is no AppleBot-Extended. And if you blocked it in the past it remains blocked.

1 more reply

mdhb2y ago

So built on stolen data essentially.

3 more replies

__MatrixMan__2y ago

Piracy requires multiparty conspiracy against the establishment. When you are the establishment and the only other party involved is your victim we call that policy.

ksec2y ago· 11 in thread

It will still be a lot better than 8GB though.

talldayo2y ago

The Steam Deck ships with 16 gigs of quad-channel LPDDR5 and it costs $400. Apple knows exaaaactly what they're doing with this sort of pricing.

Can't forget about that cozy 256gb SSD either. An AI computer will need more than that, right?

wraptile2y ago

3 more replies

zer0zzz2y ago

Is steamdeck sold at cost? From what I know Apple has a rule that everything must be sold at 40% margins. That is prob the main reason.

4 more replies

torginus2y ago

If you could do that, you could easily get hundreds of GB/s read speed out of simple TLC flash.

Obviously this is the future, but I think it's a promising one.

dgellow2y ago

The have insane margin on ram and storage, it would be really surprising to see them move away from their current strategy

keyle2y ago

Also when I compare with my co-workers the memory pressure is a lot less running the same software on macOS than Windows. This might have to be due to the UI framework at play.

But that said, I totally agree that Apple is doing daylight robbery with their additional RAM pricing, and the minimum on offer is laughable.

Aeolun2y ago

Any apple device has much better usability than a windows machine, regardless of RAM.

wwtrv2y ago

> This may sound biased,

It certainly does, close to irrational even. IIRC memory compression is enabled by default on Windows as well.

1 more reply

snemvalts2y ago

The swapping is indeed faster as the SSD is on the SoC and so fast to access. To the point that an 4 year old 8gb M1 Air is enough for simpler development work, at least for me.

2 more replies

vishnugupta2y ago

Does it matter what the baseline memory is as long as they have 16GB M4 as an option?

manmal2y ago

Some companies give their employees only base models.

1 more reply

scosman2y ago· 10 in thread

“We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks.”

This is huuuuge. I don’t see announcement of 3rd party training support yet, but I imagine/hope it’s planned.

Even if the base models aren’t SOTA to start, the developer experience is great and they can iterate.

Server side is so much easier, but look forward to local+private taking over for a lot of use cases.

dimtion2y ago

With huge blobs of binary model weights, dynamic linking is cool again.

pjmlp2y ago

Dynamic linking has always been cool for writing plugins.

It is kind of ironic that languages that praise so much for going back to early linking models, have to resort for much heavier OS IPC for similar capabilities.

1 more reply

inickt2y ago

Which Apple has put some pretty large effort in the last few years to improve in iOS

lossolo2y ago

It's LORA, most of the things you saw in Apple Intelligence on device presentation are basically different LORAs.

scosman2y ago

The article says it’s lora a bunch of times. That’s clear.

My comment above is about dev experience, memory swapping, tuning base models to each HW release, and app size.

eightysixfour2y ago

This is how Google is doing it too.

gokuldas0110112y ago

Indeed. Google said LoRA and apple said adapter plugging. Wonder the difference is at, Apple's dev conference is for consumers and Google's dev conference is for developers.

1 more reply

scosman2y ago

Oh missed that!

But kinda as expected: only works on 2 android phones (pixel 8 pro, S24).

Pretty typical: Apple isn’t first, but also typically will scale faster with HW+platform integration.

2 more replies

danielmarkbruce2y ago

this is pretty stock standard lora.

1 more reply

seydor2y ago

Local models are also extremely energy consuming. I don't see local AI working for long, because Large models are going to get so incomparably smarter and eventually reach general intelligence

cube22222y ago· 9 in thread

Halfway down the article contains some great charts with comparisons to other relevant models, like Mistral-7B for the on-device models, and both gpt-3.5 and 4 for the server-side models.

They include data about the ratio of which outputs human graders preferred (for server side it’s better than 3.5, worse than 4).

BUT, the interesting chart to me is „Human Evaluation of Output Harmfulness” which is much, much ”better„ than the other models. Both on-device and server-side.

Instruction following accuracy seems to be really good as well.

crooked-v2y ago

I want to know what they consider "harmful". Is it going to refuse to operate for sex workers, murder mystery writers, or people who use knives?

hotdogscout2y ago

I bet it's the usual double standards the AI one percenters cater to.

No sex because apparently it's harmful yet never explained why.

No homophobia/transphobia if you're Christian but if you're Muslim it's fine.

m4632y ago

Bet it depends on the country.

In the USA, you won't be able to ask about sex, but you can probably ask about tank man.

1 more reply

arthur_sav2y ago

They'll inject whatever ideology / dogma is "the current thing" into this.

2 more replies

its_ethan2y ago

Aerbil3132y ago

None of the use cases they presented in WWDC using Apple Intelligence was creative writing. There is one, that uses ChatGPT explicitly:

> And with Compose in Writing Tools, you can create and illustrate original content from scratch.

https://www.apple.com/apple-intelligence/

causal2y ago

Refusing to answer any question would result in a perfect score for the first chart since it says nothing of specificity

tonynator2y ago

So it's not going to be better than other models, but it will be more censored. I guess that might be a selling point for their customer base?

dghlsakjg2y ago

iPhone share is ~59% of smartphones in the US.

Their customer base is effectively all demographics.

1 more reply

TheRoque2y ago· 8 in thread

Why isn't there a comparison with the Llama3 8b in the "benchmarks" ?

axoltl2y ago

The Llama 3 license says:

IANAL but my read of this is that Apple's not allowed to use Llama 3 at all, for any purposes, including comparisons.

anvuong2y ago

They can just run the same tests and cite the results from other websites. That has nothing to do with Meta. No companies can force you to not talk about them.

1 more reply

teonimesic22y ago

mixtureoftakes2y ago

1 more reply

woadwarrior012y ago

leodriesch2y ago

hmottestad2y ago

Maybe it’s too new for them to have had time to include it in their studies?

TheRoque2y ago

Phi-3-Mini, which is in the benchmarks, was released after Llama3 8b

1 more reply

vzaliva2y ago· 7 in thread

I love that they use machinelearning.apple.com not ai.apple.com

tmpz222y ago

worstspotgain2y ago

> executed a fantastic strategy to reposition themselves in industry discussions

Just the usual marketing angle, IMO. It's not TV, it's HBO.

2 more replies

seydor2y ago

> makes me excited to develop for their platform in a way I haven't felt in a very, very, long time

1 more reply

dgellow2y ago

What excites you specifically as a developer?

1 more reply

andbberger2y ago

glad someone sane is in charge in cupertino

okdood642y ago

Apple Intelligence.

1 more reply

xwolfi2y ago

Yeah they probably were still working on the last buzzword

ndgold2y ago· 6 in thread

solarkraft2y ago

What makes you think you'll get that model?

msephton2y ago

Apparently they will in a VM but it seems perhaps only security researchers?

whazor2y ago

It would be much cooler if enterprises can swap to their custom models in their own clouds.

rekoil2y ago

> Would be cool if that allowed people to run their own.

Oh my god that would be absolutely amazing!

titaniumtown2y ago

Did it mentioned being able to spin up the server model locally? I must've missed that part in the article.

theshrike792y ago

They didn't but I'll bet it's coming in the next 5 years.

Most likely integrated with an Apple TV or a similar thing. Enough local LLM processing power to handle a family's data all in-house.

miven2y ago· 6 in thread

> For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements.

Did they go over the entire text with a thesaurus? I've never seen "palletization" be used as a viable synonym for "quantization" before, and I've read quite a few papers on LLM quantization

bagrow2y ago

https://apple.github.io/coremltools/docs-guides/source/palet...

miven2y ago

Huh, generally whenever I saw the lookup table approach in literature it was also referred to as quantization, guess they wanted to disambiguate the two methods

Though I'm not sure how warranted it really is, in both cases it's still pretty much the same idea of reducing the precision, just with different implementations

Edit: they even refer to it as LUT quantization on another page: https://apple.github.io/coremltools/docs-guides/source/quant...

1 more reply

fudged712y ago

404

2 more replies

elcritch2y ago

Huh, it’s PNG for AI weights.

cgearhart2y ago

https://en.wikipedia.org/wiki/Color_quantization

dialup_sounds2y ago

epipolar2y ago· 6 in thread

bradly2y ago

hmottestad2y ago

They’ve gone with a single 3B model and several “adapters” for each use case. One adapter is good at summarising while another good a generating message replies.

1 more reply

woadwarrior012y ago

I'm the author of Private LLM. Looks like it's just become possible[1] to run quantized LLM inference using the ANE with iOS 18. I think there are some major efficiency gains on the table now.

[1]: https://github.com/apple/coremltools/pull/2232

urbandw311er2y ago

Likely they’ll be able to take advantage of the hardware neural engine and be far more power efficient. Apple has demonstrated this is something it takes pretty seriously.

brcmthrowaway2y ago

So iOS LLM Apps dont use the neural engine? Lol

4 more replies

jamesy0ung2y ago

orbital-decay2y ago· 5 in thread

How do they represent users around the globe authentically while being located in Cupertino, CA? (more of a rhetorical question really)

esskay2y ago

You mean the person on the other side of the planet doesn't know about Philz Coffee down on Stevens Creek Blvd, or that there's a cool park a 2 minute walk away from Apple HQ?!

It does baffle me how California centric they are with many of their announcements, and even some features.

rekoil2y ago

The Maps stuff always gets me. Yeah sure it looks pretty, but almost none of what makes it a usable product is available to me in Sweden.

boxed2y ago

dgellow2y ago

I share your pain, switching between English, German, and French is really, really frustrating…

jacooper2y ago

Because then you will be typing wrong/s

wslh2y ago· 5 in thread

Is it me or Apple is really moving fast? I don't think it is easy for a company of this size to concisely put a vision of AI in these short and crazy AI times.

BTW, not an Apple fan but an Apple user.

MacsHeadroom2y ago

Google had similar AI functionality on Pixels last year and Microsoft had like six AI CoPilot products before that. So I would not say Apple is moving fast.

Most people expected this update 6 months ago.

azinman22y ago

Take a look at the yearly OS cadence. iOS 17 only came out a few months before your 6 month expectation.

doctor_eval2y ago

Since when does Apple make major software update announcements at Christmas?

wmf2y ago

People thought Apple was behind but they were just working quietly.

majestik2y ago

ChatGPT came out November 2022 and it took Apple 18 months to announce Siri will integrate with it.

Is that moving fast? Maybe, compared to what, Oracle?

ddxv2y ago· 5 in thread

Will these smaller on device models lead to a crash in GPU prices?

jondwillis2y ago

Not in the short-to-medium-term. Try the local models out, they fall over pretty quickly, even if you have 64GB+ of VRAM.

wkat42422y ago

It depends what you use them for.

sooheon2y ago

Prices fall when supply outpaces demand -- this is adding more demand.

wmf2y ago

This isn't adding GPU demand.

1 more reply

htrp2y ago

X to doubt.

ra72y ago· 4 in thread

Interesting that they’re using TPUs for training, in addition to GPUs. Is it both a technical decision (JAX and XLA) and a hedge against Nvidia?

m-s-y2y ago

They’d be silly not to hedge. Anyone, in fact, would be silly. It to hedge. On pretty much everything.

anvuong2y ago

Jax was built with TPUs in mind, so it's not surprising that they use TPUs

gokuldas0110112y ago

"Use the best tool available"

flakiness2y ago

They hired people nearby. Conveniently there is a small town called Mountain View.

htrp2y ago· 4 in thread

This kind of sounds like Loras......

cube22222y ago

The article explicitly states they’re Loras.

karmasimida2y ago

I think it is just LoRA, you can call the LoRA weights as adapters

alephxyz2y ago

The A in LoRA stands for adapters

GaggiX2y ago

LoRA stands for "Low Rank Adaptation" btw.

advael2y ago· 4 in thread

* https://arxiv.org/pdf/1801.05507

killingtime742y ago

Don't they do it in this linked article? https://security.apple.com/blog/private-cloud-compute/

senderista2y ago

advael2y ago

1 more reply

KoolKat232y ago

The only two questions I would have would be, how often are they "periodically rebooted" and what are the predefined metrics logged/reported.

We may have some insight into the second point when the code is published.

GaggiX2y ago· 3 in thread

It would be cool to understand when the system will use one or the other (the ~3 billion on-device model or the bigger one on Apple servers).

swatcoder2y ago

Conceivably, they don't have precise answers for that yet, and won't until after they see what real-world usage looks like.

GaggiX2y ago

aixpert2y ago

if you have ever used a 3 billion or 7 billion parameter model you know that they are really bad at text generation, so this will be done in the cloud

shreezus2y ago· 3 in thread

frizlab2y ago

They are? Did you watch the keynote? They talked about it at length.

1 more reply

TillE2y ago

It's literally the prompt you just gave it, that's what they're sending to ChatGPT, nothing else. None of the features that sift through your data are touching OpenAI.

gnicholas2y ago

w10-12y ago· 2 in thread

I think we as tech people lost the forest for the trees.

Apple (unwisely I think) is allowing UI's to just generate responses.

The wow-neat! experience will wear off quickly. Then even as a miss rate of 0.1%, there will be thousands - millions - of cringe-worthy examples that sully the Apple brand for quality.

It will be impossible to create quality filter good enough, and there will be no way to back these features out of the OS.

For targeted use-cases (like coding and editing), this will be useful. But these features may be what finally makes contempt for Apple go mainstream, and that would be a shame.

Internally at Apple, they likely discussed how much to limit the rollout and control usage. I think they decided to bake it into API's more to maintain developer mindshare than to keep users happy.

s3p2y ago

"that sully the Apple brand for quality."

OK. No one remembers Apple Maps, the CSAM scanning, the crush ad, etc? Companies do embarrassing stuff all the time. At least they're trying.

scottyah2y ago

I agree, they're joining in on the slippery slope auto-correct, home assistants, and Self Driving.

The current political and media scene is often described as post-truth, where accuracy isn't the biggest driving factor. It seems that computation is headed that way as well.

dingclancy2y ago· 2 in thread

aixpert2y ago

their 3 billion parameter model can't do shit, Only some basic grammar check style rewrite and maybe summarization

pertymcpert2y ago

Have you tried it much?

Isuckatcode2y ago· 2 in thread

[1] https://help.openai.com/en/articles/8590148-memory-faq

ww5202y ago

The advantage of the adaptor matrices is you can have different sets of adaptor matrices for different tasks, all based of the base model.

MacsHeadroom2y ago

ChatGPT memory is just a database with everything you told it to remember.

Low Rank Adaptors (LoRA) are a way of changing the function of a model by only having to load a delta for a tiny percentage of the weights rather than all the weights for an entirely new model.

simianparrot2y ago· 2 in thread

I just hope all of this can be toggled off, I don't want it on my devices.

dmix2y ago

They said repeatedly in the video anything going over the wire is optional and user controllable.

simianparrot2y ago

I don't want local AI either. These "smart" features are all noise to me.

buildbot2y ago· 1 in thread

3.5B per weight with no quality loss is state of the art - that's an awesome optimization result (a mix of 2b and 4b weights).

Hugsun2y ago

I would like to see their method compared quantitatively to the best llama.cpp methods. IQ3_S has a similar bpw and pretty high quality.

I wonder if they didn't stretch the truth using the phrase "without loss in accuracy".

Blackstrat2y ago· 1 in thread

hbn2y ago

I'd certainly expect you'd to be able to at the very least disable the stuff that does outgoing network requests.

As for the stuff that's local to your device, how is your privacy being invaded? It's your device's OS looking at data on the device it's running on, as it's always done.

PHGamer2y ago· 1 in thread

IOT_Apprentice2y ago

What is stopping you from doing that? Nothing. Start cooking

mFixman2y ago· 1 in thread

Has anybody here improved their day-to-day workflow with any kind of "implicit" generative AI rather than explicitly talking to an LLM?

So far all attempts seem to be building an universal Clippy. In my experience, all kinds of forced autocomplete and other suggestions have been worse than useless.

mavamaarten2y ago

GitHub Copilot works well in my experience. It does bad suggestions at times, but also really spot-on ones.

Other than that, AI for me is meme/image generation and a semi-useful chatbot.

hehdhdjehehegwv2y ago· 1 in thread

The WWDC show got on my nerves with the corpspeak, but this is pretty cool stuff.

I’ve been trying to make smaller more efficient models in my own work. I hope Apple publish some actual papers.

gepardi2y ago

Yeah it was close to “infomercial” levels of cheesy.

revscat2y ago· 1 in thread

This seems impressive. Is it, really? I don’t know enough about the subject to judge.

bastawhiz2y ago

ofou2y ago· 1 in thread

Quite interesting this was released right after multiple rants from Elon sparked debates on X.

"If Apple integrates OpenAI at the OS level, then Apple devices will be banned at my companies. That is an unacceptable security violation."

Replying to Tim Cook: "Don’t want it. Either stop this creepy spyware or all Apple devices will be banned from the premises of my companies."

"It’s patently absurd that Apple isn’t smart enough to make their own AI, yet is somehow capable of ensuring that OpenAI will protect your security & privacy!

Apple has no clue what’s actually going on once they hand your data over to OpenAI. They’re selling you down the river."

https://x.com/elonmusk/status/1800269249912381773 https://x.com/elonmusk/status/1800266437677768765 https://x.com/elonmusk/status/1800265431078551973

kanwisher2y ago

Jayakumark2y ago

So we cannot get a similar answer from LLM as its different models, you cannot across ecosystem.

anshumankmr2y ago

superkuh2y ago

rvaish2y ago

Hugsun2y ago

dharma12y ago

do they mention how big the models are? Last I saw was 3gb - I just bought a 8gb m4 iPad and keep thinking I should have gone for the 16gb one

visarga2y ago

They use synthetic data in pretraining and teacher models in RLHF, that means they use models trained on copyrighted data to make derivative models, is that sitting ok with copyright owners?

koolala2y ago

aiPhone

deldelaney2y ago

I need to resurrect by tiny old Motorola Flip Phone without internet connection. Maybe a phone should be just a phone. I don't need AI in my pants.

j / k navigate · click thread line to collapse