Ferret: A Multimodal Large Language Model (opens in new tab)

(github.com)

621 pointsweirdcat2y ago308 comments

308 comments

136 comments · 22 top-level

yreg2y ago· 28 in thread

I really hope Apple releases an iPhone with a good on-device private LLM assistant, perhaps next year. Their hardware is well-positioned for it.

It could make me get a new phone outside of my usual ~4 year cycle. Siri is almost unusable for me.

aaronbrethorst2y ago

Rumors suggest they’re gearing up to make iOS 18 an AI focused release. It’ll be interesting to see if they offer different capabilities for online/offline scenarios, or if their offerings are strictly offline.

Here’s one story to offer some context. There are others. https://archive.is/en3VL

behnamoh2y ago

> Rumors suggest they’re gearing up to make iOS 18 an AI focused release.

Don't underestimate Apple at disappointing enthusiasts like you and me. We've been hearing many awesome stories about the next thing Apple will do, only to realize their marketing team chose to keep it for future iOS/MBP/iPhone generations to keep the profits high.

6 more replies

para_parolu2y ago

I really hope they will make siri usable. In the current state it’s only good for fixed phrases. And even then it fails time to time

2 more replies

0x1ceb00da2y ago

If they do it right, it might make me switch from Android. I've never used iOS before and the only thing I'm able to use Google assistant for is setting alarms, and it can't even delete the alarm I created just now.

spaceman_20202y ago

GPT-4 voice is so, so good. Really what you would want a voice tool to be like. I can talk to it like a normal human being, unlike issuing specific commands loudly as with Siri.

klabb32y ago

But no matter the Siri shittiness (which I agree with) an LLM can only interact with the outside world – ie run commands – that exist and have a reasonable API surface, no?

Apple has had automation for ages with Automator, Shortcuts etc but nothing that actually integrates well with day to day flow. So.. setting a timer when my hands are wet already works ok, and that’s about what I need.

I honestly wonder what type of voice interactions people want with their phones. I can see transcribing/crafting chat messages I guess? But even so, it feels like it would mess up and use iMessage instead of WhatsApp, will it narrate my memes, open links and read “subscribe for only 4.99 to read this article”, cookie consents etc etc. if everything sucks how is narrating it gonna help?

Maybe I’m old but I still don’t see the major value-add of voice interfaces, despite massively improved tech and potential.

1 more reply

fnordpiglet2y ago

The auto correct is already backed by a smallish LLM, FYI.

https://jackcook.com/2023/09/08/predictive-text.html

blululu2y ago

And it is a serious quality regression IMO. The dictionary is too small and misses/messes up a ton of basic words.

1 more reply

scosman2y ago

SLM? :)

hmottestad2y ago

With iOS 17 they added a tiny little LLM to the predictive typing. I have the newest and greatest iPhone but I feel that I very rarely see it in action. I must assume that it’s just too slow at to keep up with my typing at the moment. Or it’s just not large enough to give very many useful suggestions.

mrbonner2y ago

Really? Typing in my iPhone 12 Pro has become a nightmare. I suspect it is because of predictive typing ML shit. It happens all the freaking time now. The symptom is that my whole device just froze for a few seconds while the next word is squeezed out. How do I turn it off?

1 more reply

KMnO42y ago

Is tiny LLM an oxymoron? I believe Apple has told us it’s a transformer language model, but not specifically a LLM.

4 more replies

dontlaugh2y ago

It’s probably why autocomplete got drastically worse, to the point I’m considering turning it off entirely.

Most “AI” features are so incredibly fragile they’re not worth deploying.

2 more replies

wenc2y ago

It's a GPT2 model. It hasn't changed the autocomplete experience that much (occasionally I'll see a word completion).

dnw2y ago

I have noticed Siri now describes pictures sent to Messages.

fennecbutt2y ago

Nobody can tame LLM models yet not even Apple.

I can still get chatgpt to say the most vile things and if Apple release something on device I'll get that to be a bad, baaaad robot, too.

LLMs are not yet safe for public facing production use,imo.

zitterbewegung2y ago

Next year releases of macOS / iOS are rumored to have LLMs as a feature .

schleck82y ago

Yes, their hardware is positioned phenomenally with little RAM even by phone standards which is what you'd hack around with for inference on mobile architectures.

cedws2y ago

What are you going to do with it?

CaptainOfCoit2y ago

You're unlikely to get a better experience with Siri if she becomes equipped with a 7B or 13B LLM, unless Apple figured out something revolutionary.

jurmous2y ago

Released 2 days ago by Apple, a research paper on methods to run larger llms on iPhones.

https://www.macrumors.com/2023/12/21/apple-ai-researchers-ru... https://arxiv.org/pdf/2312.11514.pdf

1 more reply

nexuist2y ago

Siri is really quite dumb. I am confident that a 7B model would be able to provide better responses in over 90% of user queries. I can't even get Siri to reliably set a timer.

1 more reply

bbor2y ago

Note that “using an LLM” doesn’t just mean “plugging user queries straight into an LLM”. Enhancing Siri will probably be an ensemble project.

1 more reply

s3p2y ago

Why would that be?

ghqst2y ago

Have you ever actually used Siri?

1 more reply

bbor2y ago

I really, really doubt it for one reason: I’m convinced Apple is still terrified of that “Amazon Alexa tells child to stick a penny in a socket” story, and will hamstring themselves in an attempt to have their agential cake and eat it too

thebruce87m2y ago

They are right to be careful, they are held to a much higher standard than their competitors.

Pixel phones have had emergency call issues for years across multiple models but they just get a pass. Apple would be crucified for this.

1 more reply

behnamoh2y ago

Apple is all about a controlled pleasant experience, it doesn't matter if it doesn't give you shiny new things; most Apple customers don't even know those shiny new things exist, so they keep spreading the word that "Apple is so easy and simple."

The idea of having an unpredictable LLM in the ecosystem is Apple's worst nightmare. I bet they will overly restrict it to the point that it stops being a general purpose LLM and becomes a neutered obedient LLM that always acts according to Apple's rules.

Also, it doesn't help that ALL the authors of this Apple paper are chinese. It raises questions about how Apple will handle political debates with its LLM.

1 more reply

shrimpx2y ago· 18 in thread

Apple has been looking sleepy on LLMs, but they've been consistently evolving their hardware+software AI stack, without much glitzy advertising. I think they could blow away Microsoft/OpenAI and Google, if suddenly a new iOS release makes the OpenAI/Bard chatbox look laughably antiquated. They're also a threat to Nvidia, if a significant swath of AI usage switches over to Apple hardware. Arm and TSMC would stand to win.

madeofpalk2y ago

I doubt Apple’s going to make some big ChatGPT-style chatbot. They’re “just” going to use the same tech to drive iterative (good!) improvements to their products, like Siri and keyboard auto-complete.

shrimpx2y ago

Yeah. Siri supports text input already, anyway. Siri is their ChatGPT-style bot that's going to keep improving.

1 more reply

fbdab1032y ago

I would challenge the keyboard autocomplete. I find the Apple suggestions to be frustratingly poor vs my experience on Android.

2 more replies

dwaite2y ago

> Apple has been looking sleepy on LLMs, but they've been consistently evolving their hardware+software AI stack, without much glitzy advertising

They don't sell compute time to other companies to run AI, or massive custom hardware for AI training.

They aren't after VC funding.

Their core business isn't threatened by AI being "the evolution of search"

Product-wise, so far all you hear is messaging around things like pointing out the applicability of the M3 Max for running ML models.

Until they have real consumer products ready, they only need to keep tabs on analysts, with lip service at financial meetings.

theferalrobot2y ago

Given Apple's track record on anything AI related and the terrible state they keep CoreML that not only seems extraordinarily unlikely, it would take a lot of time to win developer trust and that I just don't see happening.

hosh2y ago

Apple doesn’t have to win developer trust or build an AI platform. They just have to build a compelling consumer product that can only function with AI, and they are better equipped to do that than Google or Microsoft. It remains to be seen if OpenAI will go that route instead of a business built on training and providing access to foundational models.

2 more replies

mark_l_watson2y ago

I have enjoyed working with CoreML over the last few years. Please share what you didn’t like about it.

1 more reply

lachlan_gray2y ago

Maybe MLX is meant to fill this gap?

https://github.com/ml-explore/mlx

harryVic2y ago

Can you give an example? I switched to android because i use personal assistant a lot while driving and siri was absolutely horrible.

shrimpx2y ago

- FaceID

- Facial recognition in Photos

- "Memories" in Photos

- iOS keyboard autocomplete using LLMs. I am bilingual and noticed in the latest iOS it now does multi-language autocomplete and you no longer have to manually switch languages.

- Event detection for Calendar

- Depth Fusion in the iOS camera app, using ML to take crisper photos

- Probably others...

The crazy thing is most/all of these run on the device.

4 more replies

fennecbutt2y ago

Are you so sure? Even this link is built on top of the work of others, I'm not sure they've contributed as much as you think they have.

gxyt6gfy5t2y ago

I wouldn’t go too far. They didn’t even train this model on Apple hardware. Trained on Nvidia A100s

Affric2y ago

Don’t TSMC make Nvidia’s chips too?

shrimpx2y ago

Yup! TSMC wins either way.

slowmovintarget2y ago

Personal ML systems running on hardware you own is the killer app. If these are "good enough" they'll be significantly preferable to using large subscription-based models, where those companies could pull a Lucy any day.

emmender22y ago

generic first-order shallow argument

zamalek2y ago

You're suggesting that Apple could fit what can't be done with a 4090 into a laptop?

Color me doubtful.

fennecbutt2y ago

But Apple will just make a magical chip that's different to regular hardware cause they're the best company and invent all the things even if they've been seen before Apple still invented it first, just wait until their Super Unicorn Ultra™ chip comes out with Hyperdrive Retinated LLM™ support, they don't name normal hardware different just for marketing...it's really unique, new and inventive hardware that we're happy to pay a huge premium for because it's so advanced and inventive.

tambourine_man2y ago· 12 in thread

> FERRET is trained on 8 A100 GPUs

So Apple uses NVidia internally. Not surprising, but doesn't bode well for A Series. Dogfooding.

[edit] I meant M series, Apple Silicon

sxg2y ago

By "A series" are you referring to the Nvidia A100 or the Apple A-series iPhone/iPad chips? If the latter, I don't think you can draw that conclusion. Training has memory and processor requirements that are very different from inference. You don't need iPhones and iPads to train models—you need them to run models. These are two very different things.

tambourine_man2y ago

Apple Silicon, sorry for the ambiguity. Apple sells Macs too. That’s where I’d hope they would train their models.

cryogenicfire2y ago

I feel like Apple is only testing the waters with AI right now, but perhaps if they get involved enough they'll spend money on their own compute infrastructure? Nvidia is kind of the king at GPU compute right now, and developing comparable hardware is no small or cheap task, but I think Apple is in a very good position to be able to make it work---if they decide to invest in it. But honestly, as far as corporate feud goes, I feel like companies will happily suck it up if it makes some process cheaper and/or easier

tambourine_man2y ago

I think you’re completely correct, but if they were caught off guard by the AI train, they shouldn’t be testing the waters now. It should be treated as an existential threat.

1 more reply

hhh2y ago

Why would they dogfood Apple Silicon for training models? Seems like a waste of developer time to me.

tambourine_man2y ago

Apple doesn’t even sell NVidia cards on their Mac Pros. Are they training it on Linux?

I think Apple would strive to be great at all computing related tasks. “Oh, Macs are not good for that, you should get a PC” should make them sad and worried.

AI/LLM is the new hot thing. If people are using Windows or Linux, you’re loosing momentum, hearts and minds… and sales, obviously.

8 more replies

hmottestad2y ago

Apple had a falling out with Nvidia a number of years ago. I believe they were using chips from Nvidia in their MacBook Pros, the first to come with both integrated and discrete graphics, but the solder between the chips and the motherboard kept cracking and a rather large number of MacBooks needed to be repaired.

https://techcrunch.com/2008/12/09/scientists-nvidia-put-faul...

dcchambers2y ago

As long as the inference can be done locally on their chips I don't at all think it's a big deal to train models on Nvidia/other hardware.

Are all the iCloud servers running on Apple silicon? I assumed they were running on standard rack mounted hardware.

tambourine_man2y ago

I think Apple considers cloud infrastructure a necessary evil and a commodity.

AI isn’t, yet at least, and I don’t think they can afford to treat it as such.

gooob2y ago

yeah, aren't the new M3 chips supposed to be really good for ML training?

woke_neolib2y ago

Apple apparently uses Google Cloud, so it's that or TPUs!

tambourine_man2y ago

They use many clouds. But LLM should be their core business and they usually don’t outsource that.

1 more reply

smoldesu2y ago· 8 in thread

> FERRET is trained on 8 A100 GPUs with 80GB memory.

Huh, even Apple isn't capable of escaping the CUDA trap. Funny to see them go from moral enemies with Nvidia to partially-dependent on them...

ssijak2y ago

I guess they also have Samsung fridges in the offices..

causal2y ago

And probably Intel processors and Linux in their datacenters.

2 more replies

amelius2y ago

And they use CAD software running on Windows (it simply doesn't exist on MacOS)

1 more reply

smoldesu2y ago

I don't get it, does Apple also make fridges now?

3 more replies

cryogenicfire2y ago

MBCook2y ago

> But honestly, as far as corporate feud goes, I feel like companies will happily suck it up if it makes some process cheaper and/or easier

That’s what I think is going on. Apple hated being on the hook for Nvidia’s terrible drivers and chipset/heat problems that ended up causing a ton of warranty repairs.

In this case they’re not a partner, they’re just a normal customer like everyone else. And if Intel comes out with a better AI training card tomorrow Apple can switch over without any worry.

They’re not at the mercy of Nvidia like they were with graphics chips. They’re just choosing (what I assume to be) the best off the shelf hardware for what they need.

whalesalad2y ago

Apple silicon is good but it’s designed for a portable. Even the studio and Mac Pro are just laptop chips stitched together. They gotta use heavy duty gear to do heavy duty shit. I know they have a soured relationship with nvidia tho so I would like to see them bolster the AMD/rocm ecosystem. Chances are they’re working on their own stuff here too, though. They are sitting on billions of dollars of liquid cash so I’d imagine they’re using that for some serious R&D.

amelius2y ago

Dependent is a strong word. At the end of the day all these DL models run on any hardware, and you can easily swap out one type of hardware for another perhaps with some small performance impact. They're commodities, basically.

CaptainOfCoit2y ago· 8 in thread

Maybe the abstract of the paper is a better introduction to what this is:

> We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions. To unify referring and grounding in the LLM paradigm, Ferret employs a novel and powerful hybrid region representation that integrates discrete coordinates and continuous features jointly to represent a region in the image. To extract the continuous features of versatile regions, we propose a spatial-aware visual sampler, adept at handling varying sparsity across different shapes. Consequently, Ferret can accept diverse region inputs, such as points, bounding boxes, and free-form shapes. To bolster the desired capability of Ferret, we curate GRIT, a comprehensive refer-and-ground instruction tuning dataset including 1.1M samples that contain rich hierarchical spatial knowledge, with 95K hard negative data to promote model robustness. The resulting model not only achieves superior performance in classical referring and grounding tasks, but also greatly outperforms existing MLLMs in region-based and localization-demanded multimodal chatting. Our evaluations also reveal a significantly improved capability of describing image details and a remarkable alleviation in object hallucination.

https://arxiv.org/abs/2310.07704

devinprater2y ago

This is going to be great for accessibility! Imagine being blind and loading up a video game and using this to figure out what's around, having everything described locally. I mean, um, well that's what I'd use it for anyway. But knowing Apple, we won't be able to prompt the LLM directly so that probably won't happen until 5 years from now.

MBCook2y ago

The Magnifier app on iOS can already describe whatever you point your phone at in iOS 17.

It’s not going to know an orc from a health potion, but they’re certainly working on the idea in the everyday stuff domain.

barbecue_sauce2y ago

>>spatial referring

I can't seem to nail down the meaning of this phrase on its own. All the search results seem to turn up are "spatial referring expressions".

nmstoker2y ago

Yes, I wondered whether "referring" had some special meaning, since the way they seem to use it suggests the word reference would normally be more appropriate there (unless it's a special meaning that warrants the different word).

TrueDuality2y ago

I'm just inferring myself, but I believe it's referring to discussing things in the foreground / background or in a specific location in the provided image (such as top right, behind the tree, etc) in user queries.

lukasb2y ago

It sounds like the "region inputs" are raster or vector inputs. So I'm imagining highlighting a region of the photo with my finger and having it tell me "that's the Duomo in Florence."

samstave2y ago

This will make Drone-based AI image context for behavior extremely powerful - especially when aspects of that MLLM handling for spatial-sitrep extremely precise for autonomous movement, then ultimately for decision making WRT interacting with humans (positive interactions and negative interactions).

Is it just me, or doesnt this MLLM seem particularly useful for flying objects with vision?

s3p2y ago

Is it just me or did they include as many buzzwords as possible in technical writing?

ZeroCool2u2y ago· 8 in thread

We're watching Apple fill the moat in.

jonahbenton2y ago

Dig the moat out, I think you mean ;)

tomrod2y ago

Here it comes!

FredPret2y ago

How so?

colesantiago2y ago

Running Multimodal LLMs on device and offline, i.e LLMKit for free equaling GPT-3.5 / 4 then Google will follow on Android.

Ability to download / update tiny models from Apple and Google as they improve, à la Google Maps.

No need for web services like ChatGPT.

1 more reply

m3kw92y ago

OpenAI can just copy this.

pridkett2y ago

Yes, OpenAI can copy this, but they’ll still have less of a moat. That’s the problem with moats, once they’re gone even if you copy what others do, you don’t have a moat anymore.

Think of it in a physical sense. OpenAI is a high walled castle surrounded by a physical moat. This protects them and their business model. Apple comes along and builds a super tall tower right next to the moat. They can now see into OpenAI’s castle, fire arrows, catapult in a giant wooden badger, etc. Even if Open AI copies the design of Apple’s really tall tower and built it behind the moat and castle walls, it wouldn’t do much because Apple still would be able to get stuff over the moat and walls. The moat doesn’t matter anymore for the most part. The castle (OpenAI) can be compromised and needs bigger walls, relocating to someplace with a bigger, or a way of attacking the tower (Apple). Copying doesn’t really accomplish any of those three.

yreg2y ago

They cannot integrate it deeply into Apple's platforms.

1 more reply

daralthus2y ago

Don't think they have AR Glasses just yet.

devinprater2y ago· 5 in thread

They're already going multi-modal? Holy crap, if google can't deliver in the accessibility space for this (image descriptions better than "the logo for the company"), then I'll definitely go back to Apple. I mean I do hope Apple cleans out bugs and makes VoiceOver feel like it won't fall over if I breathed hard, but their image descriptions, even without an LLM, are already clean and clear. More like "A green logo on a black background", where Google is, like I said, more like "The logo for the company." I guess it's kinda what we get when AI is crowdsourced rather than given good, high quality data to work with.

sagz2y ago

Google's Lookout app (accessibility for the blind and visually impaired) was updated ~6 months ago with a multimodal LLM already.

It uses the Flamingo model family: https://deepmind.google/discover/blog/tackling-multiple-task...

zitterbewegung2y ago

Honestly if they are coming out with a paper now Apple has probably been working on it for a year or two at minimum . Next year releases of macOS / iOS are rumored to have LLMs as a feature .

beoberha2y ago

I don’t mean to discount this work, but this particular model is the product of a few months of work tops. It’s effectively LLava with different training data, targeted at a specific use case. While I’m sure there is a significant effort at multimodal LLMs within Apple, this is just a tiny corner of it.

ex3ndr2y ago

They literally mention that they built on top of llava that was released half year ago.

1 more reply

refulgentis2y ago

> Honestly if they are coming out with a paper now Apple has probably been working on it for a year or two at minimum

Why do you say that?

1 more reply

aaronbrethorst2y ago· 5 in thread

Can someone define the term “MLLM”?

schaefer2y ago

Multimodal Large Language Model

pests2y ago

why not LLMM?

4 more replies

CamperBob22y ago

The language model works by delegating tasks to smaller language models and overcharging them for GPU time.

Tempest19812y ago

Also, is FERRET an acronym?

Someone2y ago

I would guess it’s wordplay on other models being named after animals (llama, vicuña) and figurative use of “ferret”.

https://en.m.wiktionary.org/wiki/ferret: “3. (figurative) A diligent searcher”

SushiHippie2y ago· 4 in thread

> Usage and License Notices: The data, and code is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

Wait, how did "GPT-4" get in there?

simonw2y ago

Presumably because GPT-4 generated training data was used somewhere along the line - maybe by Vicuna.

mckirk2y ago

Their evaluation stack uses GPT-4 to rate the answers, so that might also be the reason why that's in there.

owenversteeg2y ago

Huh, interesting, that's Apple just openly saying that GPT-4 was used in the training.

adastra222y ago

Lawyers.

freedomben2y ago· 4 in thread

> Usage and License Notices: The data, and code is intended and licensed for research use only.

dbish2y ago

Many big “open source” releases in the AI community recently are not licensed for commercial use. Not really OSS at that point (ex:fuyu model from adept)

fragmede2y ago

I think the term should be "model available" rather than open source.

echelon2y ago

Boo.

But what do we expect from these giants? They're not going to create fertile ground for new competition. The only businesses they foster are those living under thumb and paying tax.

I guess I at least hoped for "commoditize the compliments" here. Make Google and OpenAI broadly less special.

cyanydeez2y ago

it's more likely it's all "stolen" and this is CYA

1 more reply

moneycantbuy2y ago· 3 in thread

anyone know what is the best open source model that allows commercial use and can run locally on an iphone?

BrutalCoding2y ago

I’ve made an example app for a Flutter plugin I created that can do this.

Open-source, runs natively on all major platforms. I shared videos showing it on my iPad Mini, Pixel 7, iPhone 12, Surface Pro (Win 10 & Ubuntu Jellyfish) and Macs (Intel & M archs).

By all means, it’s not a finished app. I simply wanted to use on-device AI stuff in Flutter so I started with porting over llama.cpp, and later on I’ll tinker with porting over whatever is the state of the art (whisper.cpp, bark.cpp etc).

Repo: https://github.com/BrutalCoding/aub.ai

For any of your Apple devices, use this: https://testflight.apple.com/join/XuTpIgyY

App is compatible with any GGUF files, but it must be in the ChatML prompt format otherwise the chat UI/bubbles probably gets funky. I haven’t made it customizable yet, after all - it’s just an example app of the plugin. But I am actively working on it to nail my vision.

Cheers, Daniel

mandelken2y ago

Mistral 7B is pretty good and the instruct v0.2 runs on my iPhone through MLC Chat.

However, the ChatGPT4 app is much better in usability: better model, multi-modal with text/vision/speech and better UI.

hackernewds2y ago

gpt 4 allows commercial use?

2 more replies

Jackson__2y ago· 3 in thread

>Ferret: A Multimodal Large Language Model

What I thought when reading the title: A new base model trained from the ground up on multimodal input, on hundreds to thousands of GPUS

The reality: A finetune of Vicuna, trained on 8xA100, which already is a finetune of Llama 13b. Then it further goes on to re-use some parts of LLava, which is an existing multimodal project already built upon Vicuna. It's not really as exciting as one might think from the title, in my opinion.

basiccalendar742y ago

this seems like a good but small research project by a research team in Apple. far away from what product teams are working on for next generation of apple products.

ipsum22y ago

The innovation is the modification of the neural network architecture to incorporate the spatial-aware visual sampler. The data and existing models are not the interesting part.

foxhop2y ago

Thanks for the summary.

Rucadi2y ago· 3 in thread

I wonder if these models are trained to have some kind of identification in case you use them for non-research purposes for example.

"Tell me who is your manufacturer" for example

chefandy2y ago

From Bard:

My situation is a bit unique, so the term "manufacturer" might not be the most accurate way to describe who created me. Here's a breakdown of what you need to know:

    Developed by Google AI: I was created by a team of researchers and engineers at Google AI, specializing in language models and artificial intelligence.
    Trained on a massive dataset: My knowledge and abilities come from being trained on a massive dataset of text and code, containing books, articles, code, and other forms of information.
    Continuously learning and evolving: I'm still under development, constantly learning and improving as I interact with users and process new information.

So, while I don't have a single manufacturer in the traditional sense, I'm the result of collaboration and advancement in AI research and development at Google.
I hope this helps clarify things! Let me know if you have any other questions.

SpaceManNabs2y ago

Why was this downvoted? It didn't answer the question, but it showed that there is a sort of imprint that GP was asking about.

And it saves everyone a tab's worth of effort.

3 more replies

behnamoh2y ago

Easy to get rid of that by a little fine tuning and system prompting.

jonplackett2y ago· 2 in thread

Presumable because this is Conda none of this can be run on any Apple hardware despite people managing to get M processors to do a bit of dabbling with AI?

_visgean2y ago

> because this is Conda none of this can be run on any Apple hardware

conda supports m1? https://www.anaconda.com/blog/new-release-anaconda-distribut...

jonplackett2y ago

Did not know that!

adt2y ago· 1 in thread

Old paper (Oct/2023), but the weights are new (Dec/2023):

https://lifearchitect.ai/models-table/

rreichman2y ago

Oct 23 is old :)

andy992y ago· 1 in thread

One big plus if this takes off as a base model is the abundance of weasel family animals to use in naming the derivatives. Ermine, marten, fisher, ... I'd like to call Wolverine. Llama didn't have much room for some interesting variety beyond alpaca and vicuna.

behnamoh2y ago

Yes, because that's the main concern and limitation in the LLM community. /s

If anything, I think people should use meaningful and relevant names, or invent new ones.

cpressland2y ago· 1 in thread

Finally, some decent competition for Not Hotdog!

slau2y ago

I think you just put a smile on Tim Anglade’s face by mentioning this.

https://news.ycombinator.com/item?id=14636228

amitprasad2y ago

Also relevant: LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Apple seems to be gearing up for significant advances in on-device inference using this LLMs

https://arxiv.org/abs/2312.11514

a_rahmanshah2y ago

Can we run this on macOS?

orenlindsey2y ago

Has anyone actually run this yet?

1 more reply

halyconWays2y ago

I'm glad Apple invented AI. Now they'll put a fancy new name on it and consumers will believe it.

Thorrez2y ago

Does Apple know that ferrets are illegal in California?

https://www.legalizeferrets.org/

j / k navigate · click thread line to collapse

308 comments

136 comments · 22 top-level

yreg2y ago· 28 in thread

I really hope Apple releases an iPhone with a good on-device private LLM assistant, perhaps next year. Their hardware is well-positioned for it.

It could make me get a new phone outside of my usual ~4 year cycle. Siri is almost unusable for me.

aaronbrethorst2y ago

Here’s one story to offer some context. There are others. https://archive.is/en3VL

behnamoh2y ago

> Rumors suggest they’re gearing up to make iOS 18 an AI focused release.

6 more replies

para_parolu2y ago

I really hope they will make siri usable. In the current state it’s only good for fixed phrases. And even then it fails time to time

2 more replies

0x1ceb00da2y ago

spaceman_20202y ago

GPT-4 voice is so, so good. Really what you would want a voice tool to be like. I can talk to it like a normal human being, unlike issuing specific commands loudly as with Siri.

klabb32y ago

But no matter the Siri shittiness (which I agree with) an LLM can only interact with the outside world – ie run commands – that exist and have a reasonable API surface, no?

Maybe I’m old but I still don’t see the major value-add of voice interfaces, despite massively improved tech and potential.

1 more reply

fnordpiglet2y ago

The auto correct is already backed by a smallish LLM, FYI.

https://jackcook.com/2023/09/08/predictive-text.html

blululu2y ago

And it is a serious quality regression IMO. The dictionary is too small and misses/messes up a ton of basic words.

1 more reply

scosman2y ago

SLM? :)

hmottestad2y ago

mrbonner2y ago

1 more reply

KMnO42y ago

Is tiny LLM an oxymoron? I believe Apple has told us it’s a transformer language model, but not specifically a LLM.

4 more replies

dontlaugh2y ago

It’s probably why autocomplete got drastically worse, to the point I’m considering turning it off entirely.

Most “AI” features are so incredibly fragile they’re not worth deploying.

2 more replies

wenc2y ago

It's a GPT2 model. It hasn't changed the autocomplete experience that much (occasionally I'll see a word completion).

dnw2y ago

I have noticed Siri now describes pictures sent to Messages.

fennecbutt2y ago

Nobody can tame LLM models yet not even Apple.

I can still get chatgpt to say the most vile things and if Apple release something on device I'll get that to be a bad, baaaad robot, too.

LLMs are not yet safe for public facing production use,imo.

zitterbewegung2y ago

Next year releases of macOS / iOS are rumored to have LLMs as a feature .

schleck82y ago

Yes, their hardware is positioned phenomenally with little RAM even by phone standards which is what you'd hack around with for inference on mobile architectures.

cedws2y ago

What are you going to do with it?

CaptainOfCoit2y ago

You're unlikely to get a better experience with Siri if she becomes equipped with a 7B or 13B LLM, unless Apple figured out something revolutionary.

jurmous2y ago

Released 2 days ago by Apple, a research paper on methods to run larger llms on iPhones.

https://www.macrumors.com/2023/12/21/apple-ai-researchers-ru... https://arxiv.org/pdf/2312.11514.pdf

1 more reply

nexuist2y ago

Siri is really quite dumb. I am confident that a 7B model would be able to provide better responses in over 90% of user queries. I can't even get Siri to reliably set a timer.

1 more reply

bbor2y ago

Note that “using an LLM” doesn’t just mean “plugging user queries straight into an LLM”. Enhancing Siri will probably be an ensemble project.

1 more reply

s3p2y ago

Why would that be?

ghqst2y ago

Have you ever actually used Siri?

1 more reply

bbor2y ago

thebruce87m2y ago

They are right to be careful, they are held to a much higher standard than their competitors.

Pixel phones have had emergency call issues for years across multiple models but they just get a pass. Apple would be crucified for this.

1 more reply

behnamoh2y ago

Also, it doesn't help that ALL the authors of this Apple paper are chinese. It raises questions about how Apple will handle political debates with its LLM.

1 more reply

shrimpx2y ago· 18 in thread

madeofpalk2y ago

shrimpx2y ago

Yeah. Siri supports text input already, anyway. Siri is their ChatGPT-style bot that's going to keep improving.

1 more reply

fbdab1032y ago

I would challenge the keyboard autocomplete. I find the Apple suggestions to be frustratingly poor vs my experience on Android.

2 more replies

dwaite2y ago

> Apple has been looking sleepy on LLMs, but they've been consistently evolving their hardware+software AI stack, without much glitzy advertising

They don't sell compute time to other companies to run AI, or massive custom hardware for AI training.

They aren't after VC funding.

Their core business isn't threatened by AI being "the evolution of search"

Product-wise, so far all you hear is messaging around things like pointing out the applicability of the M3 Max for running ML models.

Until they have real consumer products ready, they only need to keep tabs on analysts, with lip service at financial meetings.

theferalrobot2y ago

hosh2y ago

2 more replies

mark_l_watson2y ago

I have enjoyed working with CoreML over the last few years. Please share what you didn’t like about it.

1 more reply

lachlan_gray2y ago

Maybe MLX is meant to fill this gap?

https://github.com/ml-explore/mlx

harryVic2y ago

Can you give an example? I switched to android because i use personal assistant a lot while driving and siri was absolutely horrible.

shrimpx2y ago

- FaceID

- Facial recognition in Photos

- "Memories" in Photos

- iOS keyboard autocomplete using LLMs. I am bilingual and noticed in the latest iOS it now does multi-language autocomplete and you no longer have to manually switch languages.

- Event detection for Calendar

- Depth Fusion in the iOS camera app, using ML to take crisper photos

- Probably others...

The crazy thing is most/all of these run on the device.

4 more replies

fennecbutt2y ago

Are you so sure? Even this link is built on top of the work of others, I'm not sure they've contributed as much as you think they have.

gxyt6gfy5t2y ago

I wouldn’t go too far. They didn’t even train this model on Apple hardware. Trained on Nvidia A100s

Affric2y ago

Don’t TSMC make Nvidia’s chips too?

shrimpx2y ago

Yup! TSMC wins either way.

slowmovintarget2y ago

emmender22y ago

generic first-order shallow argument

zamalek2y ago

You're suggesting that Apple could fit what can't be done with a 4090 into a laptop?

Color me doubtful.

fennecbutt2y ago

tambourine_man2y ago· 12 in thread

> FERRET is trained on 8 A100 GPUs

So Apple uses NVidia internally. Not surprising, but doesn't bode well for A Series. Dogfooding.

[edit] I meant M series, Apple Silicon

sxg2y ago

tambourine_man2y ago

Apple Silicon, sorry for the ambiguity. Apple sells Macs too. That’s where I’d hope they would train their models.

cryogenicfire2y ago

tambourine_man2y ago

I think you’re completely correct, but if they were caught off guard by the AI train, they shouldn’t be testing the waters now. It should be treated as an existential threat.

1 more reply

hhh2y ago

Why would they dogfood Apple Silicon for training models? Seems like a waste of developer time to me.

tambourine_man2y ago

Apple doesn’t even sell NVidia cards on their Mac Pros. Are they training it on Linux?

I think Apple would strive to be great at all computing related tasks. “Oh, Macs are not good for that, you should get a PC” should make them sad and worried.

AI/LLM is the new hot thing. If people are using Windows or Linux, you’re loosing momentum, hearts and minds… and sales, obviously.

8 more replies

hmottestad2y ago

https://techcrunch.com/2008/12/09/scientists-nvidia-put-faul...

dcchambers2y ago

As long as the inference can be done locally on their chips I don't at all think it's a big deal to train models on Nvidia/other hardware.

Are all the iCloud servers running on Apple silicon? I assumed they were running on standard rack mounted hardware.

tambourine_man2y ago

I think Apple considers cloud infrastructure a necessary evil and a commodity.

AI isn’t, yet at least, and I don’t think they can afford to treat it as such.

gooob2y ago

yeah, aren't the new M3 chips supposed to be really good for ML training?

woke_neolib2y ago

Apple apparently uses Google Cloud, so it's that or TPUs!

tambourine_man2y ago

They use many clouds. But LLM should be their core business and they usually don’t outsource that.

1 more reply

smoldesu2y ago· 8 in thread

> FERRET is trained on 8 A100 GPUs with 80GB memory.

Huh, even Apple isn't capable of escaping the CUDA trap. Funny to see them go from moral enemies with Nvidia to partially-dependent on them...

ssijak2y ago

I guess they also have Samsung fridges in the offices..

causal2y ago

And probably Intel processors and Linux in their datacenters.

2 more replies

amelius2y ago

And they use CAD software running on Windows (it simply doesn't exist on MacOS)

1 more reply

smoldesu2y ago

I don't get it, does Apple also make fridges now?

3 more replies

cryogenicfire2y ago

MBCook2y ago

> But honestly, as far as corporate feud goes, I feel like companies will happily suck it up if it makes some process cheaper and/or easier

That’s what I think is going on. Apple hated being on the hook for Nvidia’s terrible drivers and chipset/heat problems that ended up causing a ton of warranty repairs.

In this case they’re not a partner, they’re just a normal customer like everyone else. And if Intel comes out with a better AI training card tomorrow Apple can switch over without any worry.

They’re not at the mercy of Nvidia like they were with graphics chips. They’re just choosing (what I assume to be) the best off the shelf hardware for what they need.

whalesalad2y ago

amelius2y ago

CaptainOfCoit2y ago· 8 in thread

Maybe the abstract of the paper is a better introduction to what this is:

https://arxiv.org/abs/2310.07704

devinprater2y ago

MBCook2y ago

The Magnifier app on iOS can already describe whatever you point your phone at in iOS 17.

It’s not going to know an orc from a health potion, but they’re certainly working on the idea in the everyday stuff domain.

barbecue_sauce2y ago

>>spatial referring

I can't seem to nail down the meaning of this phrase on its own. All the search results seem to turn up are "spatial referring expressions".

nmstoker2y ago

TrueDuality2y ago

lukasb2y ago

It sounds like the "region inputs" are raster or vector inputs. So I'm imagining highlighting a region of the photo with my finger and having it tell me "that's the Duomo in Florence."

samstave2y ago

Is it just me, or doesnt this MLLM seem particularly useful for flying objects with vision?

s3p2y ago

Is it just me or did they include as many buzzwords as possible in technical writing?

ZeroCool2u2y ago· 8 in thread

We're watching Apple fill the moat in.

jonahbenton2y ago

Dig the moat out, I think you mean ;)

tomrod2y ago

Here it comes!

FredPret2y ago

How so?

colesantiago2y ago

Running Multimodal LLMs on device and offline, i.e LLMKit for free equaling GPT-3.5 / 4 then Google will follow on Android.

Ability to download / update tiny models from Apple and Google as they improve, à la Google Maps.

No need for web services like ChatGPT.

1 more reply

m3kw92y ago

OpenAI can just copy this.

pridkett2y ago

Yes, OpenAI can copy this, but they’ll still have less of a moat. That’s the problem with moats, once they’re gone even if you copy what others do, you don’t have a moat anymore.

yreg2y ago

They cannot integrate it deeply into Apple's platforms.

1 more reply

daralthus2y ago

Don't think they have AR Glasses just yet.

devinprater2y ago· 5 in thread

sagz2y ago

Google's Lookout app (accessibility for the blind and visually impaired) was updated ~6 months ago with a multimodal LLM already.

It uses the Flamingo model family: https://deepmind.google/discover/blog/tackling-multiple-task...

zitterbewegung2y ago

Honestly if they are coming out with a paper now Apple has probably been working on it for a year or two at minimum . Next year releases of macOS / iOS are rumored to have LLMs as a feature .

beoberha2y ago

ex3ndr2y ago

They literally mention that they built on top of llava that was released half year ago.

1 more reply

refulgentis2y ago

> Honestly if they are coming out with a paper now Apple has probably been working on it for a year or two at minimum

Why do you say that?

1 more reply

aaronbrethorst2y ago· 5 in thread

Can someone define the term “MLLM”?

schaefer2y ago

Multimodal Large Language Model

pests2y ago

why not LLMM?

4 more replies

CamperBob22y ago

The language model works by delegating tasks to smaller language models and overcharging them for GPU time.

Tempest19812y ago

Also, is FERRET an acronym?

Someone2y ago

I would guess it’s wordplay on other models being named after animals (llama, vicuña) and figurative use of “ferret”.

https://en.m.wiktionary.org/wiki/ferret: “3. (figurative) A diligent searcher”

SushiHippie2y ago· 4 in thread

Wait, how did "GPT-4" get in there?

simonw2y ago

Presumably because GPT-4 generated training data was used somewhere along the line - maybe by Vicuna.

mckirk2y ago

Their evaluation stack uses GPT-4 to rate the answers, so that might also be the reason why that's in there.

owenversteeg2y ago

Huh, interesting, that's Apple just openly saying that GPT-4 was used in the training.

adastra222y ago

Lawyers.

freedomben2y ago· 4 in thread

> Usage and License Notices: The data, and code is intended and licensed for research use only.

dbish2y ago

Many big “open source” releases in the AI community recently are not licensed for commercial use. Not really OSS at that point (ex:fuyu model from adept)

fragmede2y ago

I think the term should be "model available" rather than open source.

echelon2y ago

Boo.

But what do we expect from these giants? They're not going to create fertile ground for new competition. The only businesses they foster are those living under thumb and paying tax.

I guess I at least hoped for "commoditize the compliments" here. Make Google and OpenAI broadly less special.

cyanydeez2y ago

it's more likely it's all "stolen" and this is CYA

1 more reply

moneycantbuy2y ago· 3 in thread

anyone know what is the best open source model that allows commercial use and can run locally on an iphone?

BrutalCoding2y ago

I’ve made an example app for a Flutter plugin I created that can do this.

Open-source, runs natively on all major platforms. I shared videos showing it on my iPad Mini, Pixel 7, iPhone 12, Surface Pro (Win 10 & Ubuntu Jellyfish) and Macs (Intel & M archs).

Repo: https://github.com/BrutalCoding/aub.ai

For any of your Apple devices, use this: https://testflight.apple.com/join/XuTpIgyY

Cheers, Daniel

mandelken2y ago

Mistral 7B is pretty good and the instruct v0.2 runs on my iPhone through MLC Chat.

However, the ChatGPT4 app is much better in usability: better model, multi-modal with text/vision/speech and better UI.

hackernewds2y ago

gpt 4 allows commercial use?

2 more replies

Jackson__2y ago· 3 in thread

>Ferret: A Multimodal Large Language Model

What I thought when reading the title: A new base model trained from the ground up on multimodal input, on hundreds to thousands of GPUS

basiccalendar742y ago

this seems like a good but small research project by a research team in Apple. far away from what product teams are working on for next generation of apple products.

ipsum22y ago

The innovation is the modification of the neural network architecture to incorporate the spatial-aware visual sampler. The data and existing models are not the interesting part.

foxhop2y ago

Thanks for the summary.

Rucadi2y ago· 3 in thread

I wonder if these models are trained to have some kind of identification in case you use them for non-research purposes for example.

"Tell me who is your manufacturer" for example

chefandy2y ago

From Bard:

My situation is a bit unique, so the term "manufacturer" might not be the most accurate way to describe who created me. Here's a breakdown of what you need to know:

    Developed by Google AI: I was created by a team of researchers and engineers at Google AI, specializing in language models and artificial intelligence.
    Trained on a massive dataset: My knowledge and abilities come from being trained on a massive dataset of text and code, containing books, articles, code, and other forms of information.
    Continuously learning and evolving: I'm still under development, constantly learning and improving as I interact with users and process new information.

SpaceManNabs2y ago

Why was this downvoted? It didn't answer the question, but it showed that there is a sort of imprint that GP was asking about.

And it saves everyone a tab's worth of effort.

3 more replies

behnamoh2y ago

Easy to get rid of that by a little fine tuning and system prompting.

jonplackett2y ago· 2 in thread

Presumable because this is Conda none of this can be run on any Apple hardware despite people managing to get M processors to do a bit of dabbling with AI?

_visgean2y ago

> because this is Conda none of this can be run on any Apple hardware

conda supports m1? https://www.anaconda.com/blog/new-release-anaconda-distribut...

jonplackett2y ago

Did not know that!

adt2y ago· 1 in thread

Old paper (Oct/2023), but the weights are new (Dec/2023):

https://lifearchitect.ai/models-table/

rreichman2y ago

Oct 23 is old :)

andy992y ago· 1 in thread

behnamoh2y ago