But you still need the working set of frequently used experts to actually fit in RAM, or at least stay cached. Expert routing happens per token, per layer. If those weights aren’t resident, you’re effectively pulling them from disk on the critical path of generation — over and over again.
That’s not “just slower,” that’s order of magnitude slower. You’ll end up with constant page faults and page cache churn. And if swap is on the same device as the model, you’re now competing for bandwidth on top of that.
IMO the main benefit of mmap is ability to reclaim cold pages during high memory-pressure events when model isn't active.
There are some on sale via eBay right now. The memory controllers on some Nvidia gpus support well beyond the 16-24gb they shipped with as standard, and enterprising folks in China desolder the original memory chips and fit higher capacity ones.
https://www.tomshardware.com/pc-components/gpus/chinese-work...
There's also unreleased Nvidia engineering samples of cards with doubled VRAM like this - https://www.reddit.com/r/nvidia/comments/1rczghu/update_unre...
The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.
You can certainly daisy chain several 3090's together but it doesn't work seamlessly.
It's not "daisy chaining" 3090 has NVLink.
This setup will work for 100B models as well. And yes, the Mac will draw less power, but the Nvidia machine will be many times faster. So depending on your specific Mac and your specific Nvidia setup, the performance per watt will be in the same ballpark. And higher absolute performance is certainly a nice perk.
> You can certainly daisy chain several 3090's together but it doesn't work seamlessly.
Citation needed; there's no "daisy chaining" in the setup I describe, and low level libraries like pytorch as well as higher level tools like Ollama all seamlessly support multiple GPUs.
Regardless - there's a difference between training and inference. And pytorch doesn't magically make 5 gpus behave like 1 gpu.
The cheapest Apple desktop with 128GB of memory shows up as costing $3499 for me, which isn't very "enthusiast-compatible", it's about 3x the minimum salary in my country!
$3499 is definitely enthusiast compatible. That's beefy gaming PC tier, which is possibly the canonical example of an enthusiast market.
This isn't tens of thousands of dollars for top tier Nvidia chips we're talking about.
In the most literal meaning, absolutely, "Enthusiast" just means a person who likes something, is excited about something.
When it comes to market and products though, typically you'll see the word "Enthusiast" as mid-tier - something like: Consumer --> Enthusiast --> Professional (may have words like "Prosumer" in there as well etc:)
In that context, which is typically the one people will use when discussing product pricing and placement, "Enthusiast" is somebody who yes enjoys something, but does it sufficiently to be discerning and capable of purchasing mid-tier or above hardware.
So while a consumer photographer, may use their phone or compact or all-in-one camera, enthusiast photographer will probably spend $3000 - $5000 in camera gear. Equivalently, there are myriad gamers out there (on phones, consoles, Geforce Now, whatever:), an enthusiast gamer is assumed to have a dedicated gaming computer, probably a tower, with a dedicated video card, likely say a 5070ti or above, probably 32GB+ RAM, couple of SSDs which are not entry level, etc.
Again, this is not to say a person with limited budget is "not a real enthusiast", no gatekeeping is intended here; simply, if it may help, what the word means when it comes to market segmentation and product pricing :)
It's out of reach for lots of people, even in developed countries. But it's easily within reach for loads of people that care more about computing than other stuff.
Enthusiast compute hardware doesn't cater to the people on the minimum salary in any country, let alone developing nations. When Ferrari makes a car they don't ask themselves if people on minimum salary will be able to afford them.
In in the bottom two poorest EU member states and Apple and Microsoft Xbox don't even bother to have a direct to customer store presence here, you buy them from third party retailers.
Why? Probably because their metrics show people here are too poor to afford their products en-masse to be worth operating a dedicated sales entity. Even though plenty of people do own top of the line Macbooks here, it's just the wealthy enthusiast niche, but it's still a niche for the volumes they (wish to)operate at. Why do you think Apple launched the Mac Neo?
Why? Enthusiasts are by definition people for whom value for money is not the main driver but top performance and cutting edge novelty at any cost. Affording enthusiast computer hardware is not a human right same how affording a Lamborghini or McMansion isn't.
But you don't need to buy a Lamborghini to do your grocery shopping or drive your kids to school, same how you don't need an Nvidia 5090 or MacBook Pro Max to do your taxes or do your school work.
So the definition is fine as it is. It's hardware for people with very deep pockets, often called whales.
Enthusiast in this contest more or less means you are excited enough about something to get a level above what normal people should get and just below professional pricing. An enthusiast camera body can be 2000 euros.
I would say an enthusiast computer is 2-4k.
It really depends what you meant with minimum salary (yearly?) because paying 3 months of salary for a computer like that isn't far fetched. You're not using this to generate recipes for cookies. An enthusiast level car is expensive as well.
That said, a higher end gaming setup is going to cost that much and is absolutely in the enthusiast realm. "enthusiast" doesn't mean compatible with "minimum wage"
We are so freaking spoiled by the cheap cost of compute now.