IMO they should do a better job of referencing existing papers and techniques. The way they wrote about "adaptors" can make it seem like it's something novel, but it's actually just re-iterating vanilla LoRA. It was enough to convince one of the top-voted HackerNews comments that this was a "huge development".
Benchmarks are nice though.
Was anyone expecting anything new?
Apple has never been big on living at the cutting edge of technology exploring spaces that no one has explored before—from laptops to the iPhone to iPads to watches, every success they've had has come from taking tech that was already prototyped by many other companies and smoothing out the usability kinks to get it ready for the mainstream. Why would deep learning be different?
I think he is pointing out for people interested in research.
OTOH, it is interesting to see how a company is applying AI to customers at the end. It will bring up new challenges that will be interesting from at least an engineering point of view.
There was such a time. Same as with Google. Interestingly, around 2015-2016 both companies significantly shifted to iterative products from big innovations. It's more visible with Google than Apple, but here's both.
Apple:
- Final Cut Pro
- 1998: iMac
- 1999: iBook G3 (father of all MacBooks)
- 2000: Power Mac G4 Cube (the early grandparent of the Mac Mini form factor), Mac OS X
- 2001: iPod, iTunes
- 2002: Xserve (rackable servers)
- 2003: Iterative products only
- 2004: iWork Suite, Garage Band
- 2005: iPod Nano, Mac mini
- 2006: Intel Macs, Boot Camp
- 2007: iPhone and Apple TV
- 2008: MacBook Air, iPhone 3G
- 2009: iPhone 3Gs, all-in-one iMac
- 2010: iPad, iPhone 4
- 2011: Final Cut Pro X
- 2012: Retina displays, iBooks Author
- 2013: iWork for iCloud
- 2014: Swift
- 2015: Apple Watch, Apple Music
- 2016: Iterative products only
- 2017: Iterative products mainly, plus ARKit
- 2018: Iterative products only
- 2019: Apple TV +, Apple Arcade
- 2020: M1
- 2021: Iterative products only
- 2022: Iterative products only
- 2023: Apple Vision Pro
Google:
- 1998: Google Search
- 2000: AdWords (this is where it all started going wrong, lol)
- 2001: Google Images Search
- 2002: Google News
- 2003: Google AdSense
- 2004: Gmail, Google Books, Google Scholar
- 2005: Google Maps, Google Earth, Google Talk, Google Reader
- 2006: Google Calendar, Google Docs, Google Sheets, YouTube bought this year
- 2007: Street View, G Suite
- 2008: Google Chrome, Android 1.0
- 2009: Google Voice, Google Wave (early Docs if I recall correctly)
- 2010: Google Nexus One, Google TV
- 2012: Google Drive
- 2013: Chromecast
- 2014: Android Wear, Android Auto, Google Cardboard, Nexus 6, Google Fit
- 2015: Google Photos
- 2016: Google Assistant, Google Home
- 2017: Mainly iterative products only, Google Lens announced but it never rolled out really
- 2018: Iterative products only
- 2019: Iterative products only
- 2020: Iterative products only, and some rebrands (Talk->Chat, etc)
- 2021: Iterative products only, and Tensor Chip
- 2022: Iterative products only
- 2023: Iterative products only, and Bard (half-baked).
Perhaps there is still hope of a relaunch of xserve; with the widespread use of Apple computers amongst developers Apple has a real chance of challenging NVIDIA's CUDA moat.
Rather than just pre-baking static LoRAs to ship with the base model (e.g. one global "rewrite this in a friendly style" LoRA, etc), Apple seem to have chosen a bounded set of behaviors they want to implement as LoRAs — one for each "mode" they want their base model to operate in — and then set up a pipeline where each LoRA gets fine-tuned per user, and re-fine-tuned any time the data dependencies that go into the training dataset for the given LoRA (e.g. mail, contacts, browsing history, photos, etc) would change.
In other words, Apple are using their LoRAs as the state-keepers for what will end up feeling to the user like semi-online Direct Preference Optimization. (Compare/contrast: what Character.AI does with their chatbot response ratings.)
---
I'm not as sure, from what they've said here, whether they're also implying that these models are being trained in the background on-device.
It could very well be possible: training something that's only LoRA-sized, on a vertically-integrated platform optimized for low-energy ML, that sits around awake but doing nothing for 8 hours a day, might be practical. (Normally it'd require a non-quantized copy of the model, though. Maybe they'll waste even more of your iPhone's disk space by having both quantized and non-quantized copies of the model, one for fast inference and the other for dog-slow training?)
But I'm guessing they've chosen not to do this — as, even if it were practical, it would mean that any cloud-offloaded queries wouldn't have access to these models.
Instead, I'm guessing the LoRA training is triggered by the iCloud servers noticing you've pushed new data to them, and throwing a lifecycle notification into a message queue of which the LoRA training system is a consumer. The training system reduces over changes to bake out a new version of any affected training datasets; bakes out new LoRAs; and then basically dumps the resulting tensor files out into your iCloud Drive, where they end up synced to all your devices.
> ...each LoRA gets fine-tuned per user...
Apple would not implement these sophisticated user specific LoRA training techniques without mentioning them anywhere. No big player has done anything like this and Apple would want the credit for this innovation.
> The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand
This along with other statements in the article about keeping the base model weights unchanged says to me that they are simply swapping out adapters on a per app or per task basis. I highly doubt they will fine tune adapters on user data since they have taken a position against this. I wonder how successful this approach will be vs merging the adapters with the base model. I can see the benefits but there are also downsides.
(So there is room left if you're limited by memory or budget.)
This is the important part.
My advisor said new means old method applied to new data or new method on old data.
Commercially, that means price points, i.e., discrete points where something becomes viable.
Maybe that's iterative, but maybe not. Either way, once the opportunity presents, time is of the essence.
Or Intel calling USB4 devices and cables which meet quality and feature requirements 'Thunderbolt 5'
Compared to, say, manufacturers who aren't willing to meet any certification requirements or to properly implement the standards at play saying they have "USB-A 3.2 2x2 ports" on their motherboards.
Retina doesn't carry the same weight as an industry certification effort like thunderbolt, but it still informs people that a screen actually meets some sort of bar without them having to evaluate pages of tech specs, and reviews saying whether the tech specs are accurate or have undocumented caveats.
Finally, establishing such certifications are difficult - look at the number of failed attempts at creating industry quality/feature marks in the television market.
And if Karpathy thinks so then I assume it's good enough for HN:
Maybe it's my fault as a reader, but I think the writing could be clearer. Usually in a research paper you would link to the LoRA paper there too.
Quick straw poll survey around the office, many think their data will be sent off to OpenAI by default for these new features which is not the case.
Just want to point out I call this launch huge, didn’t say “huge development” as quoted, and didn’t imply what was interesting was the ML research. No one in this thread used the quoted words, at least that I can see.
My comment was about dev experience, memory swapping, potential for tuning base models to each HW release, fine tune deployment, and app size. Those things do have the potential to be huge for developers, as mentioned. They are the things that will make a local+private ML developer ecosystem work.
I think the article and comment make sense in their context: a developer conference for Mac and iOS devs.
Apple also explicitly says it’s LoRA.
If they try to market it with a seemingly unique or yet-unheard of name, then yeah. It is nice knowing what the "real world" name of an Apple-ized technology is.
Just ignoring it and marketing the technology under some new name is adjacent to lying to your audience through omission.
That's a classic Apple strategy though.
Besides I could do "named person on a beach in August" and get the correct thing in photos on Android photos, so I don't get it.
It's amazing for apple users if they didn't have it before. But from a tech stand point people could have had it for a while.
* Clearly outlining their intent/policies for training/data use. Committing to no using user data or interactions for training their base models is IMO actually a pretty big deal and a differentiator from everyone else.
* There's a never-ending stream of new RL variants ofc, but that's how technology advances, and I'm pretty interested to see how these compare with the rest: "We have developed two novel algorithms in post-training: (1) a rejection sampling fine-tuning algorithm with teacher committee, and (2) a reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator. We find that these two algorithms lead to significant improvement in the model’s instruction-following quality."
* I'm interested to see how their custom quantization compares with the current SoTA (probably AQLM atm)
* It looks like they've done some interesting optimizations to lower TTFT, this includes the use of some sort of self-speculation. It looks like they also have a new KV-cache update mechanism and looking forward to reading about that as well. 0.6ms/token means that for your average I dunno, 20 token query you might only wait 12ms for TTFT (I have my doubts, maybe they're getting their numbers from much larger prompts, again, I'm interested to see for myself)
* Yes, it looks like they're using pretty standard LoRAs, the more interesting part is their (automated) training/re-training infrastructure but I doubt that's something that will be shared. The actual training pipeline (feedback collection, refinement, automated deployment) is where the real meat and potatoes of being able to deploy AI for prod/at scale lies. Still, what they shared about their tuning procedures is still pretty interesting, as well as seeing which models they're comparing against.
As this article doesn't claim to be a technical report or a paper, while citations would be nice, I can also understand why they were elided. OpenAI has done the same (and sometimes gotten heat for it, like w/ Matroyshka embeddings). For all we know, maybe the original author had references, or maybe since PEFT isn't new to those in the field, that describing it is just being done as a service to the reader - at the end of the day, it's up to the reader to make their own judgements on what's new or not, or a huge development or not. From my reading of the article, your conclusion, which funnily enough is now the new top-rated comment on this thread isn't actually much more accurate the the one old one you're criticizing.
They seem to have a good model for adding value to their products without the hold my beer, conquer the world bullshit that you get from OpenAI, et al.
They include data about the ratio of which outputs human graders preferred (for server side it’s better than 3.5, worse than 4).
BUT, the interesting chart to me is „Human Evaluation of Output Harmfulness” which is much, much ”better„ than the other models. Both on-device and server-side.
I wonder if that’s part of wanting to have gpt as the „level 3”. Making their own models much more cautious, and using OpenAI’s models in a way that makes it clear „it was ChatGPT that said this, not us”.
Instruction following accuracy seems to be really good as well.
No sex because apparently it's harmful yet never explained why.
No homophobia/transphobia if you're Christian but if you're Muslim it's fine.
In the USA, you won't be able to ask about sex, but you can probably ask about tank man.
> And with Compose in Writing Tools, you can create and illustrate original content from scratch.
It will still be a lot better than 8GB though.
Can't forget about that cozy 256gb SSD either. An AI computer will need more than that, right?
Same way Apple and Samsung ship 128GB of storage when the production price between 128gb and 1tb is like 10$ (on a 1000$ device). Samsung even got rid of micro sd slot. It's so blatant it's actually depressing.
If you could do that, you could easily get hundreds of GB/s read speed out of simple TLC flash.
Obviously this is the future, but I think it's a promising one.
Also when I compare with my co-workers the memory pressure is a lot less running the same software on macOS than Windows. This might have to be due to the UI framework at play.
But that said, I totally agree that Apple is doing daylight robbery with their additional RAM pricing, and the minimum on offer is laughable.
It certainly does, close to irrational even. IIRC memory compression is enabled by default on Windows as well.
Edit: I see they're committing to publishing the OS images running on their inference servers (https://security.apple.com/blog/private-cloud-compute/). Would be cool if that allowed people to run their own.
Oh my god that would be absolutely amazing!
Most likely integrated with an Apple TV or a similar thing. Enough local LLM processing power to handle a family's data all in-house.
I think they saw the response to all the AI shoveling and Microsoft Recall and executed a fantastic strategy to reposition themselves in industry discussions. I still have tons of reservations about privacy and what this will all look like in a few years, but you really have to take your hat off to them. WWDC has been awesome and it makes me excited to develop for their platform in a way I haven't felt in a very, very, long time.
Just the usual marketing angle, IMO. It's not TV, it's HBO.
No one is reluctant to use the word smartphone to include iPhones. I don't think anyone is going to use the Apple Intelligence moniker except in the same cases where they'd say iCloud instead of cloud services.
It's also a little clunky. Maybe they could have gone with... xI? Too close to the Chinese Xi. iAI? Sounds like the Spanish "ay ay ay." Not an easy one I think. The number of person-hours spent on this must have been something.
AI will ultimately do all the 'development', and will replace all apps. The integrations are going to be a temporary measure. Only apps that will survive are the ones that control things that apple cannot control (ie. how Uber controls its fleet)
Apple (unwisely I think) is allowing UI's to just generate responses.
The wow-neat! experience will wear off quickly. Then even as a miss rate of 0.1%, there will be thousands - millions - of cringe-worthy examples that sully the Apple brand for quality.
It will be impossible to create quality filter good enough, and there will be no way to back these features out of the OS.
For targeted use-cases (like coding and editing), this will be useful. But these features may be what finally makes contempt for Apple go mainstream, and that would be a shame.
Internally at Apple, they likely discussed how much to limit the rollout and control usage. I think they decided to bake it into API's more to maintain developer mindshare than to keep users happy.
The one feature that could flip that script is interacting with Siri/AI in order to get things done. The frustration with knowing what you want but not how or whether it can be done drives a lot of tech angst. If this only meant ordinary people could use their existing phones to their full extent, it would be a huge win.
OK. No one remembers Apple Maps, the CSAM scanning, the crush ad, etc? Companies do embarrassing stuff all the time. At least they're trying.
I think it's been awhile since consumers have trusted or relied on consumer tech. Browsing the web from a phone can only be described as adversarial. Scrolling down a top google result recipe site is almost impossible. Texts don't always send and you can't keep up with all the cloud backup offerings that it's hard to tell if your photos are actually being saved.
The current political and media scene is often described as post-truth, where accuracy isn't the biggest driving factor. It seems that computation is headed that way as well.
Interesting that they’re using TPUs for training, in addition to GPUs. Is it both a technical decision (JAX and XLA) and a hedge against Nvidia?
Did they go over the entire text with a thesaurus? I've never seen "palletization" be used as a viable synonym for "quantization" before, and I've read quite a few papers on LLM quantization
Though I'm not sure how warranted it really is, in both cases it's still pretty much the same idea of reducing the precision, just with different implementations
Edit: they even refer to it as LUT quantization on another page: https://apple.github.io/coremltools/docs-guides/source/quant...
This is huuuuge. I don’t see announcement of 3rd party training support yet, but I imagine/hope it’s planned.
One of the hard things about local+private ML is I don’t want every app I download to need GBs of weights, and don’t want a delay when I open a new app and all the memory swap happens. As an app developer I want the best model that runs on each HW model, not one lowest common denominator model for slowest HW I support. Apple has the chance to make this smooth: great models tuned to each chip, adapters for each use case, new use cases only have a few MB of weights (for a set of current base models), and base models can get better over time (new HW and improved models). Basically app thinning for models.
Even if the base models aren’t SOTA to start, the developer experience is great and they can iterate.
Server side is so much easier, but look forward to local+private taking over for a lot of use cases.
It is kind of ironic that languages that praise so much for going back to early linking models, have to resort for much heavier OS IPC for similar capabilities.
My comment above is about dev experience, memory swapping, tuning base models to each HW release, and app size.
But kinda as expected: only works on 2 android phones (pixel 8 pro, S24).
Pretty typical: Apple isn’t first, but also typically will scale faster with HW+platform integration.
I wonder if they didn't stretch the truth using the phrase "without loss in accuracy".
>We represent the values of the adapter parameters using 16 bits, and for the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.
This kind of sounds like Loras......
So we cannot get a similar answer from LLM as its different models, you cannot across ecosystem.
How do they represent users around the globe authentically while being located in Cupertino, CA? (more of a rhetorical question really)
It does baffle me how California centric they are with many of their announcements, and even some features.
They built out a system that's ready to scale to deliver features that may not work on available hardware, but they're also incentivized to minimize actual reliance on that cloud stuff as it incurs per-use costs that local runs don't.
From a ML noob (me) understanding of this, does this mean that the final matrix is regularly fine tuned instead of fine tuning the main model ? Is this similar to how chatGPT now remembers memory[1] ?
The advantage of the adaptor matrices is you can have different sets of adaptor matrices for different tasks, all based of the base model.
Low Rank Adaptors (LoRA) are a way of changing the function of a model by only having to load a delta for a tiny percentage of the weights rather than all the weights for an entirely new model.
No fine-tuning is going to happen on Apple computers or phones at any point. They are just swapping out Apple's pre-made LoRAs so that they can store one LLM and dozens of LoRAs in a fraction of the space it would take to store dozens of LLMs.
As for the stuff that's local to your device, how is your privacy being invaded? It's your device's OS looking at data on the device it's running on, as it's always done.
So far all attempts seem to be building an universal Clippy. In my experience, all kinds of forced autocomplete and other suggestions have been worse than useless.
Other than that, AI for me is meme/image generation and a semi-useful chatbot.
"If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights."
IANAL but my read of this is that Apple's not allowed to use Llama 3 at all, for any purposes, including comparisons.
BTW, not an Apple fan but an Apple user.
Most people expected this update 6 months ago.
Is that moving fast? Maybe, compared to what, Oracle?
I’ve been trying to make smaller more efficient models in my own work. I hope Apple publish some actual papers.
This seems impressive. Is it, really? I don’t know enough about the subject to judge.
Of course, Apple will never give adequate details about security mechanisms or privacy guarantees. They are in the business of selling you security as something that must be handled by them and them alone, and that knowing how they do it would somehow be less secure (This is the opposite of how it actually works, but also Apple loves doublespeak, and 1984 allusions have been their brand since at least 1984). I view that, like any claim by a tech company that they are keeping your data secure in any context, as security theater. Vague promises are no promises at all. Put up or shut up.
We may have some insight into the second point when the code is published.
If you ask it for knowledge, like a comparison of vacuum cleaner models then yes, it's a hallucination fest. They just don't have the parameters for this level of detail. This is where ChatGPT is really king.
But if you give them the data they need with RAG, they're not bad. Acting on commands, looking stuff up in provided context, summarising all perform pretty well. Which seems to be also what Apple is targeting to do with them.
And, of course, nobody has known to opt-out by blocking AppleBot-Extended until after the announcement where they've already pirated shittons of data.
In completely unrelated news, I just trained a new OS development AI on every OS Apple has ever written. Don't worry. There's an opt-out, Apple just needed to know to put these magic words in their installer image years ago. I'm sure Apple legal will be OK with this.
Public content on the internet is public content on the internet - I thought we had all agreed years ago that if you didn’t want your content copied, don’t make it freely available and unlicensed on the internet.
What I don't like is the hypocrisy that basically every AI company has engaged in, where copying my shit is OK but copying theirs is not. The Internet is not public domain, as much as Eric Bauman and every AI research team would say otherwise. Even if you don't like copyright[0], you should care about copyleft, because denying valuable creative work to the proprietary world is how you get them to concede. If you can shove that work into an AI and get the benefits of that knowledge without the licensing requirement, then copyleft is useless as a tactic to get the proprietary world to bend the knee.
[0] And I don't.
My opinion is that individual copyright ownership is a bad deal for most artists and we need collective negotiation instead. Even the most copyright-respecting, 'ethical' AI boils down to Adobe dropping a EULA roofie in the Adobe Stock Contributor Agreement that lets them pay you pennies.
Until LLMs came along, most large-scale internet scraping was for search engines. Websites benefited from this arrangement because search engines directed users to those websites.
LLMs abused this arrangement to scrape content into a local database, compress that into a language model, and then serve the content directly to the user without directing the user to the website.
It might've been legal, but that doesn't mean it was ethical.
…is there publicly visible source code for every OS Apple has ever written?
It’s not as bad as that, I think. https://support.apple.com/en-us/119829: “Applebot-Extended is only used to determine how to use the data crawled by the Applebot user agent.“
⇒ if you use robots.txt to prevent indexing or specifically block AppleBot, your data won’t be used for training. AppleBot is almost a decade old (https://searchengineland.com/apple-confirms-their-web-crawle...)
Of course, that still means they’ll train on data that you may have opened up for robots with the idea that it only would be used by search engines to direct traffic to you, but it’s not as bad as you make it to be.
Data, implies factual information. You can not copyright factual information.
The fact that I use the word "appalling" to describe the practice of doing this results in some vector relationship between the words. Thats the data, the fact, not the writing itself.
There are going to be a bunch of interesting court cases where the court is going to have to backtrack on copyrighting facts. Or were going to have to get some real odd legal interpretations of how LLM's work (and buy into them). Or we're going to have to change the law (giving everyone else first mover advantage).
Base on how things have been working I am betting that it's the last one, because it pulls up the ladder.
This is wrong. AppleBot identifier hasn't changed: https://support.apple.com/en-us/119829
There is no AppleBot-Extended. And if you blocked it in the past it remains blocked.
"If Apple integrates OpenAI at the OS level, then Apple devices will be banned at my companies. That is an unacceptable security violation."
Replying to Tim Cook: "Don’t want it. Either stop this creepy spyware or all Apple devices will be banned from the premises of my companies."
"It’s patently absurd that Apple isn’t smart enough to make their own AI, yet is somehow capable of ensuring that OpenAI will protect your security & privacy!
Apple has no clue what’s actually going on once they hand your data over to OpenAI. They’re selling you down the river."
https://x.com/elonmusk/status/1800269249912381773 https://x.com/elonmusk/status/1800266437677768765 https://x.com/elonmusk/status/1800265431078551973