RTX 5090 and M4 MacBook Air: Can It Game? (opens in new tab)

(scottjg.com)

699 pointsallenleee1mo ago180 comments

180 comments

132 comments · 31 top-level

I have been bothering the VM team for years for VM GPU pass through. I worked on the Apple Silicon Mac Pro and it would have made way more sense if you could run a linux VM and pass through the GPU that goes inside the case!

Sadly, as you can tell, they have not taken me up on my requests. Awesome that other people got it working!

m1321mo ago

It looks like the pass through part here was implemented using standard DriverKit interfaces, if I'm not mistaken. That is, the PCIe BAR can already be mapped from the user-space, without any extra modifications to macOS. It's just a matter of VMMs, such as QEMU, adopting this interface in addition to Linux VFIO and the like (unless you're talking about Virtualization.framework, which is kind of a VMM of its own).

What exactly do you feel macOS is missing?

anp1mo ago

I’m not very familiar with the specifics of pass through but IIUC only being able to map 1.5gb of active DMA buffers at a time is pretty limiting.

monocasa1mo ago

Isn't driverkit essentially a separate user space stack compared to regular code? I remember seeing the driverkit specific dyld caches in macos root partition images that included their own copies of everything down to libsystem. Getting driverkit code to run in the same process as normal user code seems like it'd be quite an uphill battle.

Presumably with the right entitlements you can just hit the same (presumably IOKit) syscalls that driverkit does. But that's an extra layer of reverse engineering, and you're not really using driverkit anymore.

scottjg1mo ago

it is a separate stack, but that probably doesn't matter much. a user process (in my case, qemu) can communicate with a driverkit driver. the user process can also map memory through the driver, which is how this pci passthrough system works.

i don't think the issues with the project really are specific to driverkit.

mikae11mo ago

>> This project requires a special entitlement from Apple. I’ve requested it, and heard they may be open to granting it, but I have not yet heard back, and I’m told that the wait time could be months.

> I have been bothering the VM team for years for VM GPU pass through.

Good luck. I'm sure they're keen on giving people access to this so that people can spend their money on NVIDIA GPUs instead of buying more expensive Macs. :)

Would of course be awesome, but I'd be very surprised if it happened.

codebje1mo ago

There isn't a more expensive Mac option to buy if what you're after is a gaming GPU. It's more likely that the VM team sees this as a very low benefit ticket to pursue given the tiny segment of Mac gamers hoping to improve their options with a Linux VM for gaming.

(Meanwhile, I'm recompiling Wine to see if I can patch it to address an issue that was hotfixed in Proton two weeks ago but isn't in a CrossOver build yet, so yeah, there's maybe some arguments to be made here that I'd be a potential beneficiary. If I weren't too cheap to spring for an eGPU in today's market, anyway.)

m1321mo ago

The entitlement in question is the standard `com.apple.developer.driverkit.transport.pci` [0], required for anything that touches the PCIe bus [1]. Apple is generally restrictive with how much third-party applications can do on machines with SIP/"full security", so I'm not exactly surprised. It's not an Apple-private entitlement, however.

The VFIO-style driver made by the author of this also appears generic enough to support all kinds of PCIe, not just GPUs. Apple might find a way to weasel out of this ("hey, this is for hardware companies and you don't seem to be affiliated with one", "your driver requests too broad access", etc.) if there really is a conflict of interest, but so far, there's a chance it will just get rubber-stamped.

I can see them rejecting it for legitimate reasons, though, at least as far as "legitimate" with Apple goes. This driver is essentially a thin layer over PCIDriverKit, exposing all functionality that's supposed to be behind the entitlement to arbitrary applications, in similar fashion to WinRing0. They probably didn't come up with all this bureaucracy only to sign something like that in the end. We'll see what happens.

[0] https://github.com/scottjg/qemu-vfio-apple/blob/84ecdcf5db6b...

[1] https://developer.apple.com/documentation/pcidriverkit/creat...

scottjg1mo ago

two semi interesting things to note around this:

1. Virtualization.framework seems to support some form of GPU passthrough from the host (granted, not eGPU - it's for the integrated GPU). I think the primary use case is having macOS guests get acceleration, while still sharing GPU time with the host. There is also a patch that recently hit QEMU mainline that supports using the "venus server" with virtio-gpu to support a similar functionality for Linux guests under Hypervisor.framework.

2. Apple internally has some kind of PCI Passthrough support available in Virtualization.framework. It seems like the code is shipped to customers in the framework, but it relies on some kind of kext or kernel component that isn't shipped in retail macOS. I can't say if that's intended to ever be released to customers, but clearly someone at Apple has thought about this the feature.

m1321mo ago

I experimented with booting Arm macOS 14-26 in QEMU a while back, building on the work of Alexander Graf for macOS 12-13, and reverse-engineered substantial parts of Hypervisor.framework, the in-kernel hypervisor, and a bit of Virtualization.framework. Got newer versions of Sequoia to boot past the log in screen, with GPU acceleration too.

Unless there's another method I missed, the internal GPU "pass through" of Virtualization.framework you're thinking of might actually just be paravirualization, at least that's what the name suggests. It's implemented in the public ParavirtualizedGraphics framework [0], albeit for PG on Arm macOS, the relevant interfaces are private [1]. I haven't looked that deep into it per se, but, fixing the bugs around it, I've run into a few clues suggesting that it's just a command stream + shared memory being passed around. It also uses its own generic driver on the guest side.

Great job, by the way! Love how authors of pieces like this casually come here to comment :)

[0] https://developer.apple.com/documentation/paravirtualizedgra...

[1] https://github.com/qemu/qemu/blob/edcc429e9e41a8e0e415dcdab6...

my1231mo ago

FYI: https://patchew.org/QEMU/20260324204855.29759-1-mohamed@unpr...

There's some randomness around Tahoe for FileVault and it crashing because Data is detected as not encrypted (and that's not OK on bare metal). If hitting that case you might need to enable FileVault inside the VM (and remember to sync aux storage afterwards if not done)

1 more reply

scottjg1mo ago

thanks!

there also appears to be a generic pci passthrough path. we were discussing it on the qemu-devel list: https://lore.kernel.org/qemu-devel/C35B5E97-73F2-4A60-951B-B...

1 more reply

brcmthrowaway1mo ago

I still believe the lack of NVIDIA GPU support in the Mac Pro will go down as one of the greatest missed opportunities in tech.

Anyway, the Mac Pro is dead now. There's only so much sales audio and video professionals can provide.

runjake1mo ago

There was some bad history between Apple and Nvidia. Perhaps with a new generation of leadership at Apple things might change.

https://www.reddit.com/r/hardware/comments/1hmgmuf/apples_hi...

mercutio21mo ago

I wasn't in the room when it happened, but this is very different than the story told internally about why Apple became allergic to Nvidia.

Arguably more petty. SJ has been dead for almost 15 year now, I imagine the C-suite might get over it at some point.

3 more replies

firecall1mo ago

Maybe with Tim and Jensen going on holiday together in China, the relationship might be healed somewhat.

Things have moved on since the days where GPUs in Macs were a priority.

But then the AI race has changed things. So who knows - maybe we will one day see official eGPU support from Apple and new drivers from nVidia. Wouldn't put on money on it though....

Aurornis1mo ago

> I still believe the lack of NVIDIA GPU support in the Mac Pro will go down as one of the greatest missed opportunities in tech.

I don’t know about that. Apple supported some full size GPUs in past product lines and the number of users was very small. Granted, LLMs change that demand but the audience for Mac Pro buyers who would use a full-size GPU that is impossible to obtain is almost nothing compared to their laptop sales.

1 more reply

pjmlp1mo ago

The missed opportunity is like with server market, now giving the workstation market to Windows and Linux.

It isn't only audio and video.

jbverschoor1mo ago

I guess that little problem with the Nvidia chips overheating in the MacBook Pro didn’t give Apple a lot of confidence

1 more reply

Melatonic1mo ago

Audio and Video professionals jumped ship around the time Apple canned all the pro software

caycep1mo ago

What are the chances there will be another Mac Pro in the future?

Will Apple ever make a computer that makes Siracusa happy? (and do you have the "Believe" shirt?)

pjmlp1mo ago

Never, a couple of years ago Apple gave up on the server market, that is why having Swift on Linux is so relevant for app developers.

Now they gave up on the workstation market that really enjoys their slots for all myriad of cards.

Having a thunderbolt cable salad is only for those that miss external extensions from 8 and 16 bit home computer days.

Which is clearly what Apple is nowadays focused, if you look back at the vertical integrations before the PC clones market took off.

So now if you really need a workstation, it is either Windows, or one of those systems sold with Red-Hat Enterprise/Ubuntu from IBM, Dell , HP.

hedora1mo ago

If you want a workstation, you are probably better off building it yourself, or having your local computer store do it. The primary exceptions are AMD strix halos or the nvidia dgx spark.

I haven’t seen a non-laughable workstation config from the big vendors since the dot com bubble. Presumably they exist, I guess?

4 more replies

dwaite1mo ago

IMHO - extremely little.

It is too inefficient to design a machine which _might_ have two GPU and a flock of additional drives installed into it. It just makes sense to instead design around having independent hardware in its own case, which can meet its own power/cooling needs. This has been a design goal since the trashcan Mac.

Having a PCIe bus increases bandwidth and reduces latency, but once you account for eGPU and for people who would be happy building custom solutions on platforms other than macOS, there's likely not enough identified market for a modular design.

crdrost1mo ago

It feels like half the problem in this blog post is dealing with memory access issues induced by QEMU and the VM boundary... it's probably something dumb I'm missing, but if you boot up Ubuntu in Docker, wouldn't the NVIDIA drivers still load? And then you wouldn't have to fight Apple about the memory management because OSX would still own the memory?

swiftcoder1mo ago

> but if you boot up Ubuntu in Docker, wouldn't the NVIDIA drivers still load?

Even if the drivers loaded, they can't talk to the GPU from within docker (unless one implements PCI passthrough). MacOS owns the PCI bus in this scenario.

smw1mo ago

docker on macos runs in a linux vm

jmalicki1mo ago

The driver wants to own the memory is the problem.

SilentM681mo ago

In your view why have they refused to implement a "Linux VM and pass through the GPU that goes inside the case?"

zer0zzz1mo ago· 18 in thread

Once egpus work on Apple Silicon there will be little reason to own a pc

traderj0e1mo ago

Been hearing this for over a decade, except back then it was eGPU in Intel Macs which were closer to other PCs if anything. Even if this didn't require so much DIY and if Thunderbolt could do PCIe speeds, most people don't want to add drama when they can just use a PC with regular PCIe slots and native compatibility with Nvidia. The native way already has enough edge cases without adding an unusual setup.

zer0zzz1mo ago

What would be native enough here? What if they got Asahi working with NV gpus for rendering and running cuda kernels? Would eGPU on asahi be sufficient or do you really only see pcie worthwhile?

Some of us mainly want more gpu options on a high performance consumer arm machine (for Linux).

traderj0e1mo ago

Thunderbolt still doesn't provide the full PCIe bandwidth, but even if it did, I'd want PCIe itself. I don't trust the encapsulated version over Thunderbolt to work the same.

Virtualized Linux would be ok though. That's what datacenters already do with their GPUs, albeit on x86 not ARM. Doesn't need to be Asahi, cause that's unlikely to completely work.

1 more reply

jaimex21mo ago

The only thing Apple silicon has going for it is power use and that gap is getting closed. I can't really see any reason why I would switch to Mac, it just seems like you pay a lot more for a closed expensive environment that fights you at every step.

I'll never pay anyone for a developer licence or fee either. They can sponsor me to port my software to their platform.

zer0zzz1mo ago

Is it? I recently paid $999 for a pre-build intel mini-pc system thats best case in line in perf with a M2 from four years ago. That seems roughly the same as what I'd paid for an equivalent mac mini in the past, and I thought prices for custom builds were going up quite a bit too?

traderj0e1mo ago

Mac lets you run any software you want, but I understand the principle of not wanting to support them.

lowbloodsugar1mo ago

Just built a workstation with an older Threadripper Pro. It has 128 PCIE lanes, for 7 16-lane PCIE slots. An egpu has 4. I have one GPU, at x16, and I can add more.

Most people don't need that, but most people don't need an eGPU either. The number of gamers who would switch to Macbook+eGPU is negligible. It's just not compelling. For LLMs, hanging a 5090 off the thunderbolt port makes prompt processing fast, but I will be surprised if the M6 doesn't come with silicon just for that, as its the current gap. M5 is quite adequate for token generation for the price, given the RAM quantity and bandwidth. An M6 that accelerates TTFT would make an eGPU irrelevant.

For gaming, the threadripper gets at least +50FPS for windows vs linux, and some games just freeze for periods of time on linux with things like dynamic frame generation. I have an SSD for windows just for gaming.

bigyabai1mo ago

> The number of gamers who would switch to Macbook+eGPU is negligible. It's just not compelling.

This. eGPUs fade in and out of relevance every few years, and even back in the Intel Macbook days there were people advocating for eGPU gaming with Bootcamp. It was a terrible solution, there is every reason to avoid macOS with a dGPU when you have something like Linux or even Windows as an alternative.

Melatonic1mo ago

Thats also because we keep trying to use terrible interconnects. If we get an interconnect with a proper latency spec things might change

zer0zzz1mo ago

FWIW, I am partial to eGPUs not for laptop gaming but for space. I want to write cuda kernels at home but I dont want a big tower. I have a Minisforum mini-pc the size of an M4 Mac Mini attached to one of those egpu enclosures and the whole setup was pretty easy and sits on my desk nicely.

traderj0e1mo ago

Yeah the desire makes sense. The Macs are very nice hardware. You can get a mobo/case that's not much larger than the GPU itself, but it's still clunky. It's just, unfortunately eGPUs are unlikely to become more than a curiosity.

1 more reply

_blk1mo ago

I assume your reasons are different to mine so for your reasons it might very well be true. But for my reasons definitely not as long as Apple Silicon can't run Linux somewhat decently natively - and even then, it's still an Apple..

zer0zzz1mo ago

Depends. I put it on an M1, and that soc is quite good at running linux.

bel81mo ago

Mac GPU isn't the bottleneck for most games. Compatibility is.

zer0zzz1mo ago

I’m not talking about games, I think a Mac mini on a rtx pro 4000 would be a nicer experience than a g10 is all.

ActorNightly1mo ago

Man, Apple fans are still proving the stereotype to be accurate after 20 years.

Ignoring the fact that the Mac OS gets in your way every time you try to do something that Apple doesn't like, with no guarantee that an update won't break anything existing, ignoring the fact that Macs are non repairable, non upgradable, ignoring the fact that they don't support multiple displays flawlessly, I hope you realize that egpu support natively is NEVER coming to Macs, because why the fuck would they enable it when they can just charge you full price for a desktop computer? Apple is built on the sole image that Apple users have money, so buying another Mac Mini or Mac Pro in addition to your laptop is what you are supposed to do.

Android is way ahead of Mac with Android Desktop mode and Samsung Dex, to the point where you don't even need to own a laptop anymore. Ive been using my S24/S25 with lapdock for over 3 years now as a laptop, and it works flawlessly. Apple can easily do this with iPhone, but they won't because that means one less macbook purchase.

zer0zzz1mo ago

> Man, Apple fans are still proving the stereotype to be accurate after 20 years.

Who is this straw man you're flogging?

> Ignoring the fact that the Mac OS gets in your way every time you try to do something that Apple doesn't like, with no guarantee that an update won't break anything existing, ignoring the fact that Macs are non repairable, non upgradable, ignoring the fact that they don't support multiple displays flawlessly,

Lot to unpack there, most of it does not matter to most normies. When I bought my current mini-pc to drive my egpu I didn't focus on any of this stuff. Just about all I looked for was something that can drive a gpu over TB4/5 and has good perf/watt in a small form factor.

> I hope you realize that egpu support natively is NEVER coming to Macs, because why the fuck would they enable it when they can just charge you full price for a desktop computer?

Sounds like you are more hopeful they wont than I am that they will. They've already enabled RDMA over TB5 for ML applications, and they've left their boot loader open enough for the asahi community to reverse engineer tons of functionality.

I do think eventually there will be some form of GPGPU programing popularized on the mac that isn't Metal (gross).

> Apple is built on the sole image that Apple users have money, so buying another Mac Mini or Mac Pro in addition to your laptop is what you are supposed to do.

I think you have a very specific use case in mind, chiefly gaming. There's a lot more eGPUs offer, and it has nothing to do with turning your normie laptop into a sick gaming rig.

> Android is way ahead of Mac with Android Desktop mode and Samsung Dex, to the point where you don't even need to own a laptop anymore. Ive been using my S24/S25 with lapdock for over 3 years now as a laptop, and it works flawlessly. Apple can easily do this with iPhone, but they won't because that means one less macbook purchase.

I fail to see how Android is relevant in this context at all? For one, the arm64 hardware would have to exceed the single thread and perf per watt of an M5 and secondly you'd actually need tools and applications worth using for desktop use.

I am seeing some of the newer AMD 370/395 and Intel Ultra 7/9 socs as being much more of a serious alternative to the M4/5 here. In fact my current eGPU setup is an Ultra 9 mini-pc with an egpu, its just a shame im still on x86.

ActorNightly1mo ago

Mac will appear to give some leeway to fake being dev friendly, but they are not. There is a reason why still Asahi is in its state - lack of any real documentation from Apple. If Apple was dev friendly, they would just bring those people on board and give them the documentation and have them develop a fully working linux for free. But Apple fundamentally DGAF about linux users.

RDMA over Thunderbolt is going to be used only with Mac devices. Apple has a history of keeping things within their own ecosystem. You gotta be insane to think that they are going to just magically allow you to plug in a graphics card and it will work natively.

The point of bringing up Android is because that is what being dev friendly. Samsung or Google have nothing to really gain to enable the desktop mode. But they do it anyway because it increases the usability of their devices. Ask yourself again, if Apple already runs arm on all of its devices, why not enable a desktop mode for the iPhone? Its EXACTLY for the reason to squeeze more money from consumer. Its why they do the thing they do with app store, that why they own all the advertising streams on their devices.

So if you wanna stay deluded about what Apple does, be my guest. Just don't be surprised when nothing turns out like you hoped.

1 more reply

mywittyname1mo ago· 17 in thread

> As much as I hate to admit it, step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.

Or, more likely, it will tell you something it doesn't know.

Reminds me of yesterday, when I was arguing with ChatGPT that the 5070TI was an actual video card. It kept trying to correct me by saying I must have meant a 4070ti, since no such 5070ti card exists.

collabs1mo ago

Or, it will acknowledge that it made a mistake and continue to make the same mistake again.

I asked Claude to generate an HTML page about PowerShell 7. It gave me a page saying 7.4 was the latest LTS release. I corrected it with links showing 7.6 was released in March and asked it to regenerate with the latest information.

It generated basically the same page with the same claim that 7.4 was the latest release.

ericmay1mo ago

> Or, it will acknowledge that it made a mistake and continue to make the same mistake again.

People do this too though. At least the AI generally tries to follow instructions that you give it even when you are lacking clarity in the details.

I feel like it's similar to the self-driving car problem. The car could have 99.9999% reliability, drive much better and safer than a human, yet folks will still freak out about a single mistake that's made even though you have actual humans today driving the wrong way down the highway, crashing in to buildings, drunk driving, stealing cars, and all sorts of other just absolutely stupid things.

We need to move away from this idea that because it's an AI system it should give you perfect responses. It's not a deterministic system and it can be wrong, though it should get better over time. Your Google search results are wrong all the time too. The NYT writes things that are factually incorrect. Why do we have such a high standard for these models when we don't apply them elsewhere?

4 more replies

dakolli1mo ago

But people want to do their taxes with these things lmfao

corry1mo ago

LLMs are (broadly-speaking) poorly-positioned to give you a strong verdict on plausibility of a frontier topic. That said - ChatGPT was exactly right in its response to OP!

"Very deep", "border-line impractical" "in a research-sense" is the perfect summary of this article itself! :)

funimpoded1mo ago

Watching the entire economy of a superpower and ~all of online culture go absolutely ga-ga over Furbys has been one of the weirdest things I've ever witnessed.

dakolli1mo ago

Watching the entire economy of a superpower bet its entire future on SOTA text autocomplete models has been interesting to watch (which I think you're referring too).

Previous Empires naively bet their entire future on the words of magicians, or people who claimed they could look into water, the sky and fire and tell you what the future is going to be.

Machine Learning Engineers are the modern day Empire's court magician.

Apocryphon1mo ago

Eh, in this use case it's more like a goofy search engine.

perarneng1mo ago

This is why i use grok expert mode. It agressivly goes out searching the web for info. Its so much better then relying on year old data.

_blk1mo ago

Yes, I really like that about Grok. It had a few good qualities but it was too verbose so now it's mostly Claude.

JumpCrisscross1mo ago

Solid compromise is Kagi's research assistant. Aggressively cites, unlike Claude. Concise, unlike Grok.

amluto1mo ago

At least ChatGPT is now aware that Codex exists. I have a chat, still in my history, from a few months ago, in which I asked for help wrangling npm to get @openai/codex working, and ChatGPT said:

> Important: Codex CLI no longer exists

> OpenAI discontinued the Codex model + CLI a while back. There is no official binary named codex in any current OpenAI npm packages. OpenAI’s current CLI tool is:

    npm install -g openai

> which installs the openai command, not codex.

The world knowledge of these models is not necessarily up to date :)

edit: I replayed the same prompt into current ChatGPT and it is less clueless now. Maybe OpenAI noticed that it was utterly dumb that GPT-5.whatever didn't believe that Codex existed and fine-tuned it.

sigmoid101mo ago

>The world knowledge of these models is not necessarily up to date :)

It's amazing how this still needs to be said. Codex was released in April 2025. The initial GPT-5 and 5.1 still had a knowledge cutoff in late 2024. Like, what did you expect? Always beware the knowledge cutoff for LLMs (although recent releases have gotten much better with researching the web for updates before answering modern software topics).

1 more reply

Tsiklon1mo ago

I argued with GPT-OSS 120B about cascade lake Xeon workstation CPU parts not having a GPU when it vehemently said otherwise

simonh1mo ago

It’s training data only goes up to late 2024 or early 2025 so that might be why, though it does have access to the internet.

mywittyname1mo ago

Yeah, the solution was to link it to the nvidia page of the card, then it was like, 'oh, okay.' But at that point, I lost faith in it's ability to provide me with the information I was looking for. If it's information is so out of date that it doesn't know about the 5000 series, how could I be confident that it knew the details I was asking about (game engine related research)?

asats1mo ago

Are you using the instant model?

2 more replies

weird-eye-issue1mo ago

Depending on your ChatGPT settings...

Aurornis1mo ago· 13 in thread

Excellent article.

The game benchmarks are fun but the LLM improvements are where this gets really interesting for practical use. I love Apple platforms as an approachable way to run local models with a lot of RAM, but their relatively slow prompt processing speed is often overlooked.

> Here you can see the big issue with Macs: the prompt processing (aka “prefill”) speed. It just gets worse and worse, the longer the prompt gets. At a 4K-token prompt, which doesn’t seem very long, it takes 17 seconds for the M4 MacBook Air to parse before we even start generating a response. Meanwhile, if you strap the eGPU to it, it’ll only take 150ms. It’s 120x faster.

The prefill problem goes unnoticed when you’re playing around with the LLM with small chats. When you start trying to use it for bigger work pieces the compute limit becomes a bottleneck.

The time to first token (TTFT) charts don’t look bad until you notice that they had to be shown on a logarithmic scale because the Mac platforms were so much slower than full GPU compute.

superlopuh1mo ago

I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac? To elaborate, the article just says that this step is compute bound, but I'm wondering whether it is just that simple or if it might also be less optimised in MLX?

Aurornis1mo ago

Prefill (prompt processing) is compute bound doing large matrix operations. Token generation (aka tokens/s) is memory bandwidth bound.

The RTX 5090 has an incredible amount of compute performance for matrix operations and a lot of memory bandwidth. The Apple Silicon parts have unusually high memory bandwidth for general purpose compute chips, which is why they can generate tokens so fast. Their raw matrix compute performance is amazing for their power envelope but not nearly as fast as a dedicated GPU consuming 400-500W.

Apple added tensor cores on the M5 generation which help with those matrix operations, which is why the M5 performs so much better than the M4 Max in that article.

Dedicate GPUs like the RTX 5090 are in another league, though.

You can see the divergence in the high resolution gaming benchmarks, too. Once he starts benchmarking at 4K or 6K where the CPU emulation stops being a bottleneck, the raw compute of the 5090 completely crushes any of the Apple Silicon GPUs.

PicardsFlute1mo ago

The TTFT benchmarks don’t look right to me. I don’t use vLLM, but at 16k pre-fill, the M5 Max is 3.6 times faster than the M4 Max. The 5090 is surely faster, but the numbers in the article are not reflecting what I have seen thus far. Perhaps vLLM hasn’t been updated to use the new tensor APIs for metal?

My point is this: The M5 should have reflected this in the charts, but it doesn’t. The situation on pre-fill is not nearly as bad as in the M4 generation.

ademeure1mo ago

Apple GPUs didn’t have tensor cores until the M5 (aka “a neural accelerator in each core”) and in the article’s charts that a M5 Pro significantly beats a M4 Max (while in other workloads it would be much smaller since Pro is ~1/2 Max).

EDIT: since Aurornis beat me by 3 minutes, I’ll add another interesting tidbit instead :)

NVIDIA tensor cores on consumer GPUs are massively less powerful per SM core than on their datacenter counterparts-parts (which also makes them easier to get to peak efficiency on consumer GPUs because the rest of the pipeline is much more quickly a bottleneck as per Amdahl’s Law).

This is potentially changing with Vera Rubin CPX which looks an awful lot like a RTX 5090 replacement but with the full-blown datacenter tensor cores (that won’t be available unless you pay for the datacenter SKU) - so it will have very high TFLOPS relative to its bandwidth.

The target market for the CPX is exactly this: prefill and Time To First Token. You can basically just throw compute at the problem for (parts of) prefill performance (but it won’t help anything else past a certain point) and the 5090/M5 are nowhere near that limit.

So the design choice for NVIDIA/Apple/etc of how much silicon to spend for this on consumer GPUs is mostly dictated by economics and how much they can reuse the same chips for the different markets.

tpurves1mo ago

@Ademeure Where do you think the market will be by the time, say year from now, when Apple has rolled out it's M6 generation? Do you think one more process node and architecture revision will be enough yet to tip the balance that local LLM starts to go mainstream?

Melatonic1mo ago

Does that include stuff like the Pro Blackwell 6000? Or are the tensor cores as good per SM comparably? They perform quite well on many tests

1 more reply

mathisfun1231mo ago

> I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac?

because the GPUs aren't as fantastic as everyone assumes?

> might also be less optimised in MLX?

prefill has gotta be one of the most optimized paths in MLX...

bigyabai1mo ago

No you don't understand, on Apple Silicon my CPU has comparable memory bandwidth to a $400 Pascal-era GPU. With the unified memory architecture, that means my iGPU gets 2016-levels of DDR transfer speed with none of the upsides of CUDA. It's the most cutting-edge hardware ever put in a personal computer, without a doubt.

1 more reply

Moosdijk1mo ago

It feels pedantic to point it out, but it’s actually 113x faster.

Seeing the author present their results like this give off the impression that they’re biased, which I am sure they aren’t.

scottjg1mo ago

the exact numbers in the graph are 17019ms vs 142ms. so you're right, it's not 120x, it's 119.85x.

Moosdijk1mo ago

That explains it. Thanks!

brcmthrowaway1mo ago

Use oMLX. Qwen3.6 - 300tok/s PP, 30tok/s TG.

mercutio21mo ago

This is The Way.

moralestapia1mo ago· 6 in thread

Wow, phenomenal project and write-up, thanks for sharing it.

"no - not in any practical sense today, and "maybe" only in a very deep, borderline-impractical research sense."

This is why humans will always rule over crappy LLMs.

falcor841mo ago

Wait, why? This is exactly what I as a human would have said in this situation.

Or if you're referring to how the OP still decided to go ahead, I've seen AIs go ahead on impractical courses of action many times, and surprisingly succeed on some of them.

scottjg1mo ago

in fairness to the LLM critics, every time i ran into a minor speed bump in this project, it told me it probably just wasn't possible to get it to work well. the LLM did pretty actively discourage me from trying to get the whole thing working.

that said, since i was willing to ignore that aspect of it, it did accelerate getting the work done by a lot. it seems like it understands system programming really well, and did a good job navigating the qemu codebase. i have ~20 years of systems programming experience so i already knew what had to be done here. it didn't really guide the project much, but it did write a lot of the code.

moralestapia1mo ago

And I see that you succeeded in not doing it.

Congrats! Each one got what they wanted :).

lowbloodsugar1mo ago

Every major moment in my career has been me doing something that another human or clique of humans has said is impossible. If you think this is purely an LLM trait, I can't imagine you've tried to achieve anything important in the real world.

csours1mo ago

I believe that LLM (and ML in general) tools really shine when they are developed and used AS tools.

Unfortunately, I also believe that market forces may push away from this direction, as LLM companies try to capture the value stream

rvz1mo ago

Exactly. AI psychosis is real.

Never let an AI tell you that you cannot do something practical for your own self for research, discovery or for fun.

The only thing that is close to impractical is expecting your non-technical friends or others to follow you without any incentive or benefit.

frollogaston1mo ago· 5 in thread

I'm guessing the x86 emu is cause Windows games are rarely built for ARM, right? Was kinda curious how an ARM VM would fare. Anyway awesome article.

hparadiz1mo ago

Yes. Valve has done a ton of work here because it's required to be able to run x86 games on a Steam Frame which has an ARM cpu.

bigyabai1mo ago

The Steam Deck is pure x86, it's not an ARM-based CPU. The Steam Frame might be what you're thinking of.

hparadiz1mo ago

You're right. I was thinking of what I was reading about the Steam Frame

hypercube331mo ago

Steam deck runs a full x86-64 AMD APU. The work valve has done for that was to get Windows games to run seamlessly on Linux.

Hopefully in 2026 the Valve Index VR headset which is ARM (Qualcomm?) we get what you're talking about here - basically proton for Win32/64 to Linux ARM64.

Side note that Windows on ARM isn't bad just that its priced out of its league and cooling is awful for gaming on current laptops. The only issue I had was OpenGL needing some obscure GL on DirectX thing for Maya3D to get games to work.

1 more reply

sva_1mo ago

As sibling pointed out, the Steamdeck basically runs a Ryzen 3 7335U which is x86.

lenerdenator1mo ago· 4 in thread

The lack of native games on Apple Silicon is one of the greatest crimes ever committed against computing.

I got Fallout 3 working on my M2 MBP as well as it did on Windows back in the day. Temps were cool, battery was decent. If they sold my college years gaming collection (15-ish years ago) in a way that ran natively through GoG or Steam, I'd buy every single title.

bigyabai1mo ago

Porting games natively to macOS is a waste of developer time. Apple has already depreciated vast swathes of 32-bit games that were never updated to support 64-bit x86 or Apple Silicon. Developers that give macOS the same level of attention as Windows don't get the same level of support that Microsoft offers in return.

Not to mention that Mac owners are a minority share of the PC gaming market. Linux has the right idea, if you don't translate the games then you'll never have true preservation.

astrange1mo ago

> Apple has already depreciated vast swathes of 32-bit games that were never updated to support 64-bit x86 or Apple Silicon.

They had literally 15 years of warning about this.

bigyabai1mo ago

Okay, now literally count how many developers went back to update their vintage macOS games.

I'm not going to blame the developers here because it's not their fault.

nottorp1mo ago

Skyrim runs well [1] on my M2 mac mini through crossover and rosetta. So most older games will run even better.

The real question is what happens when they drop Rosetta. They promised they'll keep the APIs related to running 32 bit games but can we trust them?

[1] Not at 8k 240 fps of course.

neuroelectron1mo ago· 2 in thread

I just want to point out that anything you ask ChatGPT about that hasn't been discussed 1000 times on Reddit or Wikipedia is going to be wrong, and it will only be "right" in the sense that it aligns with the artificial consensus created on those platforms.

Of course the author probably did that as a joke.

MikeNotThePope1mo ago

Pretty much! A precedent-fueled prediction engine can’t predict the unprecedented.

neuroelectron1mo ago

It (LLMs in general) actually can make some very prescient hallucinations by making similar inferences across dissimilar domains, but they have since removed that feature to prevent liability and libel. GPT3 was much more useful in this capacity, especially before they started stress testing it on 4chan (Jan 2023)

nothinkjustai1mo ago· 2 in thread

> As much as I hate to admit it, step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.

It’s these people, not the ones who refuse to use LLMs, who are as they say, “cooked”.

linkregister1mo ago

The author of the blog is not cooked; they're raw. Their inventive, multi-chain setup was tuff. Their PCI passthrough and qemu patches were straight fire. Unless you can point to something you've done this impressive, you're just an unc bro.

nothinkjustai1mo ago

Fuck you got me there. Unc out

djmips1mo ago· 1 in thread

> Because OpenGL is not well-supported anymore on macOS, the game is completely unplayable there, even with CrossOver. Ironically, it plays totally fine on a Windows PC, but this is a game you literally can’t play on Mac without this eGPU setup.

I understand that this is true it seems that Doom does support Vulkan but you would need to add VK_NV_glsl_shader to MoltenVK. Probably much less work than what went into hanging an RTX 5090 off of a M4. Still, kudos to the scott and the local AI Inference speeds are pretty cool. What a crazy project! <applause>

scottjg1mo ago

interesting. that might be a fun intro project to MoltenVK. I hadn't dug into what was missing for Doom. I thought maybe the issue was that the intro/menu always ran in opengl mode or something. If it's just one missing op, that's way easier.

divbzero1mo ago· 1 in thread

This is pretty impressive. My impression was that eGPUs simply do not work with Apple Silicon.

(EDIT: Apple agrees with my impression. “To use an eGPU, a Mac with an Intel processor is required.” And, on top of that, the officially supported eGPUs were all AMD not NVIDIA. https://support.apple.com/en-us/102363)

steelbrain1mo ago

This is not using an eGPU with macOS, ie you can't run your chrome on macOS with its GPU acceleration coming from this eGPU. This is tunneling that eGPU to a Linux VM.

geerlingguy1mo ago· 1 in thread

I came into the post thinking it would be running a VM through the slow tinygrad driver... but this is much, much better.

It'd be amazing if Apple would provide better support, and allow more than that 1.5 GB window to make this easier. Arm overall has some quirks with PCIe devices, but at least in Linux, it's gotten so much easier since most modern drivers treat arm64 as a first class citizen.

scottjg1mo ago

i don't know for sure, but i suspect what makes the tinygrad stuff slow isn't the macos host driver itself. i think they're doing something very similar to what i'm doing, which is just mapping the PCI BARs to userspace, then they have a bunch of python code that drives the GPU.

this is only speculation, but i think the big thing that makes tinygrad slow is that the tinygrad inference engine has not really been optimized much for all these open LLM models. probably most of the work has gone towards optimizing the stack for george's self-driving hardware company. since you can't just run the existing CUDA kernels on their engine, that makes things a lot tougher, engineering-wise.

i am actually curious if my project could share a macos host driver with them. i think it would need some changes, but it seems like there's a lot of overlap

bilekas1mo ago· 1 in thread

I love how its listed as "RTX 5090 Discrete' Sir that is anything but discrete!

scottjg1mo ago

i admit, you got me chuckling with that one.

arjie1mo ago· 1 in thread

Wait, this is incredible. I have a spare 5090 lying around and run a claw-like on my M4 Mini. Just plugging it into some sort of 3D print frame for stability and plugging it into the TB port might get me a pretty viable tool for local inference. Would need something neat to ensure the power etc. is well fed.

The problem is `max-num-seqs` and `max-model-len` fight each other, and unless you're in the pure single-client mode you'll need multiple slots so to speak.

pat_space1mo ago

If you get too busy to take advantage, I'll take that spare 5090 off your hands, free of charge :)

s09dfhks1mo ago· 1 in thread

what keyboard is that

scottjg1mo ago

custom zoom75

dzink1mo ago

If apple provides native support with enough bandwidth to run an external NVIDIA GPU for Inference and training, I will upgrade to the latest MBP instantly. Raise your hand if you would too.

swiftcoder1mo ago

This is proper mad science, love it

delbronski1mo ago

Nicely done! Glad to see real hacking is still alive in the age of AI.

dangus1mo ago

I'm impressed by the effort and the technical know-how.

Another part of me is almost annoyed that Apple's complete apathy toward obvious computing use cases like this is rewarded by a project like this. I feel like Macs and macOS should not be rewarded for being so difficult to extend and use outside of Apple's narrow vision of the use case of their hardware.

Apple used to support this use case wholeheartedly, but we can see that it's abandoned on their end: Intel-only, and the newest generation of AMD GPUs supported are the 6000 series: https://support.apple.com/en-us/102363

I got tired of rewarding Apple for refusing to make a computer that makes the most of the technology available. This stuff is all a lot worse than just moving over to Linux or even Windows. With hardware like the Framework 13 Pro coming out, along with a surprisingly good set of premium PC laptops, I really don't think the Mac hardware is worth it anymore. Others have legitimately caught up, especially with Apple's aging MacBook Pro chassis with the horrible notch.

Forgeties791mo ago

> step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.

Bingo. This is exactly how I use LLM. I like getting a gut check, seeing what the first recommendations are or if there is some deep flaw in what I think the approach is, and I almost never copy/paste whatever it spits back or just follow its instructions.

coder681mo ago

This seems pretty useful for AI inference if it can pass Apple approval. I've wanted to use my Nvidia GPUs with a Mac Mini, this would enable it to run CUDA directly. Very cool!

SamiahAman1mo ago

Very nice effort. This has incredible technical depth, particularly in the DMA and QEMU sections. I also like that you didn't oversell it as the ideal Mac gaming solution. I found the AI inference results to be the most fascinating. Overall, it was a great read.

Riany1mo ago

The gaming part is fun, so does the local AI numbers. As fast prefill changes the whole experience, it makes local inference feel practical

bcjdjsndon1mo ago

Say what you like about microslop but you wouldnt be asking this question over on windows

carterschonwald1mo ago

it seems like with some care and disabling sip, that some pretty good work arounds using llm assisted kext hackery would get pretty far

rballpug1mo ago

It renders according to the Blackwell and Hopper 100.

inforemix1mo ago

Awesome dude! Extra fan on the desk too :)

semiinfinitely1mo ago

where did you get a 5090 I will buy it from you

sharathdoes1mo ago

damn

sharathdoes1mo ago

lol, is there a list of games tho, which mac pro's can support

null-phnix1mo ago

i mean porbly

j / k navigate · click thread line to collapse

180 comments

132 comments · 31 top-level

matthewfcarlson1mo ago· 28 in thread

Sadly, as you can tell, they have not taken me up on my requests. Awesome that other people got it working!

m1321mo ago

What exactly do you feel macOS is missing?

anp1mo ago

I’m not very familiar with the specifics of pass through but IIUC only being able to map 1.5gb of active DMA buffers at a time is pretty limiting.

monocasa1mo ago

scottjg1mo ago

i don't think the issues with the project really are specific to driverkit.

mikae11mo ago

> I have been bothering the VM team for years for VM GPU pass through.

Good luck. I'm sure they're keen on giving people access to this so that people can spend their money on NVIDIA GPUs instead of buying more expensive Macs. :)

Would of course be awesome, but I'd be very surprised if it happened.

codebje1mo ago

m1321mo ago

[0] https://github.com/scottjg/qemu-vfio-apple/blob/84ecdcf5db6b...

[1] https://developer.apple.com/documentation/pcidriverkit/creat...

scottjg1mo ago

two semi interesting things to note around this:

m1321mo ago

Great job, by the way! Love how authors of pieces like this casually come here to comment :)

[0] https://developer.apple.com/documentation/paravirtualizedgra...

[1] https://github.com/qemu/qemu/blob/edcc429e9e41a8e0e415dcdab6...

my1231mo ago

FYI: https://patchew.org/QEMU/20260324204855.29759-1-mohamed@unpr...

1 more reply

scottjg1mo ago

thanks!

there also appears to be a generic pci passthrough path. we were discussing it on the qemu-devel list: https://lore.kernel.org/qemu-devel/C35B5E97-73F2-4A60-951B-B...

1 more reply

brcmthrowaway1mo ago

I still believe the lack of NVIDIA GPU support in the Mac Pro will go down as one of the greatest missed opportunities in tech.

Anyway, the Mac Pro is dead now. There's only so much sales audio and video professionals can provide.

runjake1mo ago

There was some bad history between Apple and Nvidia. Perhaps with a new generation of leadership at Apple things might change.

https://www.reddit.com/r/hardware/comments/1hmgmuf/apples_hi...

mercutio21mo ago

I wasn't in the room when it happened, but this is very different than the story told internally about why Apple became allergic to Nvidia.

Arguably more petty. SJ has been dead for almost 15 year now, I imagine the C-suite might get over it at some point.

3 more replies

firecall1mo ago

Maybe with Tim and Jensen going on holiday together in China, the relationship might be healed somewhat.

Things have moved on since the days where GPUs in Macs were a priority.

But then the AI race has changed things. So who knows - maybe we will one day see official eGPU support from Apple and new drivers from nVidia. Wouldn't put on money on it though....

Aurornis1mo ago

> I still believe the lack of NVIDIA GPU support in the Mac Pro will go down as one of the greatest missed opportunities in tech.

1 more reply

pjmlp1mo ago

The missed opportunity is like with server market, now giving the workstation market to Windows and Linux.

It isn't only audio and video.

jbverschoor1mo ago

I guess that little problem with the Nvidia chips overheating in the MacBook Pro didn’t give Apple a lot of confidence

1 more reply

Melatonic1mo ago

Audio and Video professionals jumped ship around the time Apple canned all the pro software

caycep1mo ago

What are the chances there will be another Mac Pro in the future?

Will Apple ever make a computer that makes Siracusa happy? (and do you have the "Believe" shirt?)

pjmlp1mo ago

Never, a couple of years ago Apple gave up on the server market, that is why having Swift on Linux is so relevant for app developers.

Now they gave up on the workstation market that really enjoys their slots for all myriad of cards.

Having a thunderbolt cable salad is only for those that miss external extensions from 8 and 16 bit home computer days.

Which is clearly what Apple is nowadays focused, if you look back at the vertical integrations before the PC clones market took off.

So now if you really need a workstation, it is either Windows, or one of those systems sold with Red-Hat Enterprise/Ubuntu from IBM, Dell , HP.

hedora1mo ago

If you want a workstation, you are probably better off building it yourself, or having your local computer store do it. The primary exceptions are AMD strix halos or the nvidia dgx spark.

I haven’t seen a non-laughable workstation config from the big vendors since the dot com bubble. Presumably they exist, I guess?

4 more replies

dwaite1mo ago

IMHO - extremely little.

crdrost1mo ago

swiftcoder1mo ago

> but if you boot up Ubuntu in Docker, wouldn't the NVIDIA drivers still load?

Even if the drivers loaded, they can't talk to the GPU from within docker (unless one implements PCI passthrough). MacOS owns the PCI bus in this scenario.

smw1mo ago

docker on macos runs in a linux vm

jmalicki1mo ago

The driver wants to own the memory is the problem.

SilentM681mo ago

In your view why have they refused to implement a "Linux VM and pass through the GPU that goes inside the case?"

zer0zzz1mo ago· 18 in thread

Once egpus work on Apple Silicon there will be little reason to own a pc

traderj0e1mo ago

zer0zzz1mo ago

What would be native enough here? What if they got Asahi working with NV gpus for rendering and running cuda kernels? Would eGPU on asahi be sufficient or do you really only see pcie worthwhile?

Some of us mainly want more gpu options on a high performance consumer arm machine (for Linux).

traderj0e1mo ago

Thunderbolt still doesn't provide the full PCIe bandwidth, but even if it did, I'd want PCIe itself. I don't trust the encapsulated version over Thunderbolt to work the same.

Virtualized Linux would be ok though. That's what datacenters already do with their GPUs, albeit on x86 not ARM. Doesn't need to be Asahi, cause that's unlikely to completely work.

1 more reply

jaimex21mo ago

I'll never pay anyone for a developer licence or fee either. They can sponsor me to port my software to their platform.

zer0zzz1mo ago

traderj0e1mo ago

Mac lets you run any software you want, but I understand the principle of not wanting to support them.

lowbloodsugar1mo ago

Just built a workstation with an older Threadripper Pro. It has 128 PCIE lanes, for 7 16-lane PCIE slots. An egpu has 4. I have one GPU, at x16, and I can add more.

bigyabai1mo ago

> The number of gamers who would switch to Macbook+eGPU is negligible. It's just not compelling.

Melatonic1mo ago

Thats also because we keep trying to use terrible interconnects. If we get an interconnect with a proper latency spec things might change

zer0zzz1mo ago

traderj0e1mo ago

1 more reply

_blk1mo ago

zer0zzz1mo ago

Depends. I put it on an M1, and that soc is quite good at running linux.

bel81mo ago

Mac GPU isn't the bottleneck for most games. Compatibility is.

zer0zzz1mo ago

I’m not talking about games, I think a Mac mini on a rtx pro 4000 would be a nicer experience than a g10 is all.

ActorNightly1mo ago

Man, Apple fans are still proving the stereotype to be accurate after 20 years.

zer0zzz1mo ago

> Man, Apple fans are still proving the stereotype to be accurate after 20 years.

Who is this straw man you're flogging?

> I hope you realize that egpu support natively is NEVER coming to Macs, because why the fuck would they enable it when they can just charge you full price for a desktop computer?

I do think eventually there will be some form of GPGPU programing popularized on the mac that isn't Metal (gross).

> Apple is built on the sole image that Apple users have money, so buying another Mac Mini or Mac Pro in addition to your laptop is what you are supposed to do.

I think you have a very specific use case in mind, chiefly gaming. There's a lot more eGPUs offer, and it has nothing to do with turning your normie laptop into a sick gaming rig.

ActorNightly1mo ago

So if you wanna stay deluded about what Apple does, be my guest. Just don't be surprised when nothing turns out like you hoped.

1 more reply

mywittyname1mo ago· 17 in thread

> As much as I hate to admit it, step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.

Or, more likely, it will tell you something it doesn't know.

Reminds me of yesterday, when I was arguing with ChatGPT that the 5070TI was an actual video card. It kept trying to correct me by saying I must have meant a 4070ti, since no such 5070ti card exists.

collabs1mo ago

Or, it will acknowledge that it made a mistake and continue to make the same mistake again.

It generated basically the same page with the same claim that 7.4 was the latest release.

ericmay1mo ago

> Or, it will acknowledge that it made a mistake and continue to make the same mistake again.

People do this too though. At least the AI generally tries to follow instructions that you give it even when you are lacking clarity in the details.

4 more replies

dakolli1mo ago

But people want to do their taxes with these things lmfao

corry1mo ago

LLMs are (broadly-speaking) poorly-positioned to give you a strong verdict on plausibility of a frontier topic. That said - ChatGPT was exactly right in its response to OP!

"Very deep", "border-line impractical" "in a research-sense" is the perfect summary of this article itself! :)

funimpoded1mo ago

Watching the entire economy of a superpower and ~all of online culture go absolutely ga-ga over Furbys has been one of the weirdest things I've ever witnessed.

dakolli1mo ago

Watching the entire economy of a superpower bet its entire future on SOTA text autocomplete models has been interesting to watch (which I think you're referring too).

Previous Empires naively bet their entire future on the words of magicians, or people who claimed they could look into water, the sky and fire and tell you what the future is going to be.

Machine Learning Engineers are the modern day Empire's court magician.

Apocryphon1mo ago

Eh, in this use case it's more like a goofy search engine.

perarneng1mo ago

This is why i use grok expert mode. It agressivly goes out searching the web for info. Its so much better then relying on year old data.

_blk1mo ago

Yes, I really like that about Grok. It had a few good qualities but it was too verbose so now it's mostly Claude.

JumpCrisscross1mo ago

Solid compromise is Kagi's research assistant. Aggressively cites, unlike Claude. Concise, unlike Grok.

amluto1mo ago

At least ChatGPT is now aware that Codex exists. I have a chat, still in my history, from a few months ago, in which I asked for help wrangling npm to get @openai/codex working, and ChatGPT said:

> Important: Codex CLI no longer exists

> OpenAI discontinued the Codex model + CLI a while back. There is no official binary named codex in any current OpenAI npm packages. OpenAI’s current CLI tool is:

    npm install -g openai

> which installs the openai command, not codex.

The world knowledge of these models is not necessarily up to date :)

sigmoid101mo ago

>The world knowledge of these models is not necessarily up to date :)

1 more reply

Tsiklon1mo ago

I argued with GPT-OSS 120B about cascade lake Xeon workstation CPU parts not having a GPU when it vehemently said otherwise

simonh1mo ago

It’s training data only goes up to late 2024 or early 2025 so that might be why, though it does have access to the internet.

mywittyname1mo ago

asats1mo ago

Are you using the instant model?

2 more replies

weird-eye-issue1mo ago

Depending on your ChatGPT settings...

Aurornis1mo ago· 13 in thread

Excellent article.

The prefill problem goes unnoticed when you’re playing around with the LLM with small chats. When you start trying to use it for bigger work pieces the compute limit becomes a bottleneck.

The time to first token (TTFT) charts don’t look bad until you notice that they had to be shown on a logarithmic scale because the Mac platforms were so much slower than full GPU compute.

superlopuh1mo ago

Aurornis1mo ago

Prefill (prompt processing) is compute bound doing large matrix operations. Token generation (aka tokens/s) is memory bandwidth bound.

Apple added tensor cores on the M5 generation which help with those matrix operations, which is why the M5 performs so much better than the M4 Max in that article.

Dedicate GPUs like the RTX 5090 are in another league, though.

PicardsFlute1mo ago

My point is this: The M5 should have reflected this in the charts, but it doesn’t. The situation on pre-fill is not nearly as bad as in the M4 generation.

ademeure1mo ago

EDIT: since Aurornis beat me by 3 minutes, I’ll add another interesting tidbit instead :)

So the design choice for NVIDIA/Apple/etc of how much silicon to spend for this on consumer GPUs is mostly dictated by economics and how much they can reuse the same chips for the different markets.

tpurves1mo ago

Melatonic1mo ago

Does that include stuff like the Pro Blackwell 6000? Or are the tensor cores as good per SM comparably? They perform quite well on many tests

1 more reply

mathisfun1231mo ago

> I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac?

because the GPUs aren't as fantastic as everyone assumes?

> might also be less optimised in MLX?

prefill has gotta be one of the most optimized paths in MLX...

bigyabai1mo ago

1 more reply

Moosdijk1mo ago

It feels pedantic to point it out, but it’s actually 113x faster.

Seeing the author present their results like this give off the impression that they’re biased, which I am sure they aren’t.

scottjg1mo ago

the exact numbers in the graph are 17019ms vs 142ms. so you're right, it's not 120x, it's 119.85x.

Moosdijk1mo ago

That explains it. Thanks!

brcmthrowaway1mo ago

Use oMLX. Qwen3.6 - 300tok/s PP, 30tok/s TG.

mercutio21mo ago

This is The Way.

moralestapia1mo ago· 6 in thread

Wow, phenomenal project and write-up, thanks for sharing it.

"no - not in any practical sense today, and "maybe" only in a very deep, borderline-impractical research sense."

This is why humans will always rule over crappy LLMs.

falcor841mo ago

Wait, why? This is exactly what I as a human would have said in this situation.

Or if you're referring to how the OP still decided to go ahead, I've seen AIs go ahead on impractical courses of action many times, and surprisingly succeed on some of them.

scottjg1mo ago

moralestapia1mo ago

And I see that you succeeded in not doing it.

Congrats! Each one got what they wanted :).

lowbloodsugar1mo ago

csours1mo ago

I believe that LLM (and ML in general) tools really shine when they are developed and used AS tools.

Unfortunately, I also believe that market forces may push away from this direction, as LLM companies try to capture the value stream

rvz1mo ago

Exactly. AI psychosis is real.

Never let an AI tell you that you cannot do something practical for your own self for research, discovery or for fun.

The only thing that is close to impractical is expecting your non-technical friends or others to follow you without any incentive or benefit.

frollogaston1mo ago· 5 in thread

I'm guessing the x86 emu is cause Windows games are rarely built for ARM, right? Was kinda curious how an ARM VM would fare. Anyway awesome article.

hparadiz1mo ago

Yes. Valve has done a ton of work here because it's required to be able to run x86 games on a Steam Frame which has an ARM cpu.

bigyabai1mo ago

The Steam Deck is pure x86, it's not an ARM-based CPU. The Steam Frame might be what you're thinking of.

hparadiz1mo ago

You're right. I was thinking of what I was reading about the Steam Frame

hypercube331mo ago

Steam deck runs a full x86-64 AMD APU. The work valve has done for that was to get Windows games to run seamlessly on Linux.

Hopefully in 2026 the Valve Index VR headset which is ARM (Qualcomm?) we get what you're talking about here - basically proton for Win32/64 to Linux ARM64.

1 more reply

sva_1mo ago

As sibling pointed out, the Steamdeck basically runs a Ryzen 3 7335U which is x86.

lenerdenator1mo ago· 4 in thread

The lack of native games on Apple Silicon is one of the greatest crimes ever committed against computing.

bigyabai1mo ago

Not to mention that Mac owners are a minority share of the PC gaming market. Linux has the right idea, if you don't translate the games then you'll never have true preservation.

astrange1mo ago

> Apple has already depreciated vast swathes of 32-bit games that were never updated to support 64-bit x86 or Apple Silicon.

They had literally 15 years of warning about this.

bigyabai1mo ago

Okay, now literally count how many developers went back to update their vintage macOS games.

I'm not going to blame the developers here because it's not their fault.

nottorp1mo ago

Skyrim runs well [1] on my M2 mac mini through crossover and rosetta. So most older games will run even better.

The real question is what happens when they drop Rosetta. They promised they'll keep the APIs related to running 32 bit games but can we trust them?

[1] Not at 8k 240 fps of course.

neuroelectron1mo ago· 2 in thread

Of course the author probably did that as a joke.

MikeNotThePope1mo ago

Pretty much! A precedent-fueled prediction engine can’t predict the unprecedented.

neuroelectron1mo ago

nothinkjustai1mo ago· 2 in thread

> As much as I hate to admit it, step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.

It’s these people, not the ones who refuse to use LLMs, who are as they say, “cooked”.

linkregister1mo ago

nothinkjustai1mo ago

Fuck you got me there. Unc out

djmips1mo ago· 1 in thread

scottjg1mo ago

divbzero1mo ago· 1 in thread

This is pretty impressive. My impression was that eGPUs simply do not work with Apple Silicon.

steelbrain1mo ago

This is not using an eGPU with macOS, ie you can't run your chrome on macOS with its GPU acceleration coming from this eGPU. This is tunneling that eGPU to a Linux VM.

geerlingguy1mo ago· 1 in thread

I came into the post thinking it would be running a VM through the slow tinygrad driver... but this is much, much better.

scottjg1mo ago

i am actually curious if my project could share a macos host driver with them. i think it would need some changes, but it seems like there's a lot of overlap

bilekas1mo ago· 1 in thread

I love how its listed as "RTX 5090 Discrete' Sir that is anything but discrete!

scottjg1mo ago

i admit, you got me chuckling with that one.

arjie1mo ago· 1 in thread

The problem is `max-num-seqs` and `max-model-len` fight each other, and unless you're in the pure single-client mode you'll need multiple slots so to speak.

pat_space1mo ago

If you get too busy to take advantage, I'll take that spare 5090 off your hands, free of charge :)

s09dfhks1mo ago· 1 in thread

what keyboard is that

scottjg1mo ago

custom zoom75

dzink1mo ago

If apple provides native support with enough bandwidth to run an external NVIDIA GPU for Inference and training, I will upgrade to the latest MBP instantly. Raise your hand if you would too.

swiftcoder1mo ago

This is proper mad science, love it

delbronski1mo ago

Nicely done! Glad to see real hacking is still alive in the age of AI.

dangus1mo ago

I'm impressed by the effort and the technical know-how.

Forgeties791mo ago

> step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.

coder681mo ago

This seems pretty useful for AI inference if it can pass Apple approval. I've wanted to use my Nvidia GPUs with a Mac Mini, this would enable it to run CUDA directly. Very cool!

SamiahAman1mo ago

Riany1mo ago

The gaming part is fun, so does the local AI numbers. As fast prefill changes the whole experience, it makes local inference feel practical

bcjdjsndon1mo ago

Say what you like about microslop but you wouldnt be asking this question over on windows

carterschonwald1mo ago

it seems like with some care and disabling sip, that some pretty good work arounds using llm assisted kext hackery would get pretty far

rballpug1mo ago

It renders according to the Blackwell and Hopper 100.

inforemix1mo ago

Awesome dude! Extra fan on the desk too :)

semiinfinitely1mo ago