It could make me get a new phone outside of my usual ~4 year cycle. Siri is almost unusable for me.
Here’s one story to offer some context. There are others. https://archive.is/en3VL
Don't underestimate Apple at disappointing enthusiasts like you and me. We've been hearing many awesome stories about the next thing Apple will do, only to realize their marketing team chose to keep it for future iOS/MBP/iPhone generations to keep the profits high.
Apple has had automation for ages with Automator, Shortcuts etc but nothing that actually integrates well with day to day flow. So.. setting a timer when my hands are wet already works ok, and that’s about what I need.
I honestly wonder what type of voice interactions people want with their phones. I can see transcribing/crafting chat messages I guess? But even so, it feels like it would mess up and use iMessage instead of WhatsApp, will it narrate my memes, open links and read “subscribe for only 4.99 to read this article”, cookie consents etc etc. if everything sucks how is narrating it gonna help?
Maybe I’m old but I still don’t see the major value-add of voice interfaces, despite massively improved tech and potential.
Most “AI” features are so incredibly fragile they’re not worth deploying.
I can still get chatgpt to say the most vile things and if Apple release something on device I'll get that to be a bad, baaaad robot, too.
LLMs are not yet safe for public facing production use,imo.
https://www.macrumors.com/2023/12/21/apple-ai-researchers-ru... https://arxiv.org/pdf/2312.11514.pdf
Pixel phones have had emergency call issues for years across multiple models but they just get a pass. Apple would be crucified for this.
The idea of having an unpredictable LLM in the ecosystem is Apple's worst nightmare. I bet they will overly restrict it to the point that it stops being a general purpose LLM and becomes a neutered obedient LLM that always acts according to Apple's rules.
Also, it doesn't help that ALL the authors of this Apple paper are chinese. It raises questions about how Apple will handle political debates with its LLM.
They don't sell compute time to other companies to run AI, or massive custom hardware for AI training.
They aren't after VC funding.
Their core business isn't threatened by AI being "the evolution of search"
Product-wise, so far all you hear is messaging around things like pointing out the applicability of the M3 Max for running ML models.
Until they have real consumer products ready, they only need to keep tabs on analysts, with lip service at financial meetings.
- Facial recognition in Photos
- "Memories" in Photos
- iOS keyboard autocomplete using LLMs. I am bilingual and noticed in the latest iOS it now does multi-language autocomplete and you no longer have to manually switch languages.
- Event detection for Calendar
- Depth Fusion in the iOS camera app, using ML to take crisper photos
- Probably others...
The crazy thing is most/all of these run on the device.
Color me doubtful.
So Apple uses NVidia internally. Not surprising, but doesn't bode well for A Series. Dogfooding.
[edit] I meant M series, Apple Silicon
I think Apple would strive to be great at all computing related tasks. “Oh, Macs are not good for that, you should get a PC” should make them sad and worried.
AI/LLM is the new hot thing. If people are using Windows or Linux, you’re loosing momentum, hearts and minds… and sales, obviously.
https://techcrunch.com/2008/12/09/scientists-nvidia-put-faul...
Are all the iCloud servers running on Apple silicon? I assumed they were running on standard rack mounted hardware.
AI isn’t, yet at least, and I don’t think they can afford to treat it as such.
Huh, even Apple isn't capable of escaping the CUDA trap. Funny to see them go from moral enemies with Nvidia to partially-dependent on them...
That’s what I think is going on. Apple hated being on the hook for Nvidia’s terrible drivers and chipset/heat problems that ended up causing a ton of warranty repairs.
In this case they’re not a partner, they’re just a normal customer like everyone else. And if Intel comes out with a better AI training card tomorrow Apple can switch over without any worry.
They’re not at the mercy of Nvidia like they were with graphics chips. They’re just choosing (what I assume to be) the best off the shelf hardware for what they need.
> We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions. To unify referring and grounding in the LLM paradigm, Ferret employs a novel and powerful hybrid region representation that integrates discrete coordinates and continuous features jointly to represent a region in the image. To extract the continuous features of versatile regions, we propose a spatial-aware visual sampler, adept at handling varying sparsity across different shapes. Consequently, Ferret can accept diverse region inputs, such as points, bounding boxes, and free-form shapes. To bolster the desired capability of Ferret, we curate GRIT, a comprehensive refer-and-ground instruction tuning dataset including 1.1M samples that contain rich hierarchical spatial knowledge, with 95K hard negative data to promote model robustness. The resulting model not only achieves superior performance in classical referring and grounding tasks, but also greatly outperforms existing MLLMs in region-based and localization-demanded multimodal chatting. Our evaluations also reveal a significantly improved capability of describing image details and a remarkable alleviation in object hallucination.
It’s not going to know an orc from a health potion, but they’re certainly working on the idea in the everyday stuff domain.
I can't seem to nail down the meaning of this phrase on its own. All the search results seem to turn up are "spatial referring expressions".
Is it just me, or doesnt this MLLM seem particularly useful for flying objects with vision?
Ability to download / update tiny models from Apple and Google as they improve, à la Google Maps.
No need for web services like ChatGPT.
Think of it in a physical sense. OpenAI is a high walled castle surrounded by a physical moat. This protects them and their business model. Apple comes along and builds a super tall tower right next to the moat. They can now see into OpenAI’s castle, fire arrows, catapult in a giant wooden badger, etc. Even if Open AI copies the design of Apple’s really tall tower and built it behind the moat and castle walls, it wouldn’t do much because Apple still would be able to get stuff over the moat and walls. The moat doesn’t matter anymore for the most part. The castle (OpenAI) can be compromised and needs bigger walls, relocating to someplace with a bigger, or a way of attacking the tower (Apple). Copying doesn’t really accomplish any of those three.
It uses the Flamingo model family: https://deepmind.google/discover/blog/tackling-multiple-task...
Why do you say that?
https://en.m.wiktionary.org/wiki/ferret: “3. (figurative) A diligent searcher”
Wait, how did "GPT-4" get in there?
But what do we expect from these giants? They're not going to create fertile ground for new competition. The only businesses they foster are those living under thumb and paying tax.
I guess I at least hoped for "commoditize the compliments" here. Make Google and OpenAI broadly less special.
Open-source, runs natively on all major platforms. I shared videos showing it on my iPad Mini, Pixel 7, iPhone 12, Surface Pro (Win 10 & Ubuntu Jellyfish) and Macs (Intel & M archs).
By all means, it’s not a finished app. I simply wanted to use on-device AI stuff in Flutter so I started with porting over llama.cpp, and later on I’ll tinker with porting over whatever is the state of the art (whisper.cpp, bark.cpp etc).
Repo: https://github.com/BrutalCoding/aub.ai
For any of your Apple devices, use this: https://testflight.apple.com/join/XuTpIgyY
App is compatible with any GGUF files, but it must be in the ChatML prompt format otherwise the chat UI/bubbles probably gets funky. I haven’t made it customizable yet, after all - it’s just an example app of the plugin. But I am actively working on it to nail my vision.
Cheers, Daniel
However, the ChatGPT4 app is much better in usability: better model, multi-modal with text/vision/speech and better UI.
What I thought when reading the title: A new base model trained from the ground up on multimodal input, on hundreds to thousands of GPUS
The reality: A finetune of Vicuna, trained on 8xA100, which already is a finetune of Llama 13b. Then it further goes on to re-use some parts of LLava, which is an existing multimodal project already built upon Vicuna. It's not really as exciting as one might think from the title, in my opinion.
"Tell me who is your manufacturer" for example
My situation is a bit unique, so the term "manufacturer" might not be the most accurate way to describe who created me. Here's a breakdown of what you need to know:
Developed by Google AI: I was created by a team of researchers and engineers at Google AI, specializing in language models and artificial intelligence.
Trained on a massive dataset: My knowledge and abilities come from being trained on a massive dataset of text and code, containing books, articles, code, and other forms of information.
Continuously learning and evolving: I'm still under development, constantly learning and improving as I interact with users and process new information.
So, while I don't have a single manufacturer in the traditional sense, I'm the result of collaboration and advancement in AI research and development at Google.I hope this helps clarify things! Let me know if you have any other questions.
And it saves everyone a tab's worth of effort.
conda supports m1? https://www.anaconda.com/blog/new-release-anaconda-distribut...
If anything, I think people should use meaningful and relevant names, or invent new ones.
Apple seems to be gearing up for significant advances in on-device inference using this LLMs