But to your other point, very little of the current popular ML stack does more than CUDA and MPS. Some will do rocm but I don’t know if the AMD iGPUs are guaranteed to support it? There’s not much for Intel GPUs.
If you only care about inference, llama.cpp supports Vulkan on any iGPU with Vulkan drivers. On my laptop with crap bios that does not allow changing any video ram settings, reserved "vram" is 2GB, but llama.cpp-vulkan can access 16GB of "vram" (half of physical ram). 16GB vram is sufficient to run any model that has even remotely practical execution speed on my bottom-of-the-line ryzen 3 3250U (Picasso/Raven 2); you can always offload some layers to CPU to run even larger.
(on Debian stable) Vulkan support:
apt install libvulkan1 mesa-vulkan-drivers vulkan-tools
Build deps for llama.cpp: apt install libshaderc-dev glslang-dev libvulkan-dev
Build llama.cpp with vulkan back-end: make clean (I added this, in case you previously built with a diff back-end)
make LLAMA_VULKAN=1
If more than one GPU:
When running, you have to set GGML_VK_VISIBLE_DEVICES to the indices of the devices you want e.g., export GGML_VK_VISIBLE_DEVICES=0,1,2
The indices correspond to the device order in vulkaninfo --summary.
By default llama.cpp will only use the first device it finds.llama.cpp-vulkan has worked really well, for me. But, per benchmarks from back when Vulkan support was first released, using the CUDA back-end was faster than the Vulkan back-end on NVIDIA GPUs. Probably same Rocm vs Vulkan on AMD too. But, zero non-free / binary blobs required for Vulkan, and Vulkan supports more devices (e.g., my iGPU is not supported by Rocm)-- haven't tried, but you can probably mix GPUs from diff manufacturers using Vulkan.
I have a laptop with a 680M and a mini pc with a 780M both beefy enough to play around with small LLM. You basically have to force the gpu detection to an older version, and I get tons of gpu resets on both.
AMD your hardware is good please give the software more love.
When I raised this feedback with our AMD Rep, they said it was intentional and that consumer GPUs are primarily meant for gaming. Absolutely shortsighted.
But failing to see it five years ago is inexcusable. Missing it two years ago is insane. And still failing to treat ML as an existential threat is, IDK, I’ve got no words.
Nvidia's drivers are still uniformly garbage (as they have been for the last 20 years) across the board, but they do work sometimes, and I guess they're better for machine learning. I have a pile of "supported" nvidia cards that can't run most opengl / glx software, even after installing dkms, recompiling the planet, etc, etc, etc.
Since AMD upstreamed their stuff into the kernel, everything just works out of the box, but you're stuck with rocm.
So, for all use cases except machine learning, AMD's software blows Nvidia's out of the water for me. This includes running Windows games, which works better under Linux than Windows (the last time I checked), thanks to Steam.
On my 780m, I installed current devuan (~= debian) stable, and had a few xscreensaver crashes and reboots. I checked dmesg, and it had clear errors about irq state machines being wrong for some of the radeon stuff. So, even when running future hardware, their error logs are great.
After enabling backports and upgrading the kernel, the dmesg errors went away, and it's a 100% uptime machine.
The remaining hardware problem is that pulseaudio is still terrible after all these years, so I have to repeatedly switch audio out to hdmi.
Right now I'm messing around trying to get pytorch vulkcan support compiling just so I avoid switching to ROCM.
Strix Point can be brought down to 15W and still do awesome. And go up to 55W+ and be fine. Nice idles. But it's monolithic, and I'm not sure if AMD & TSMC are really making that power penalty of multichip go down enough.
Would I get it? Absolutely yes. A full desktop small form factor is a very convenient, nice thing.
The only mention of NVIDIA in the post is of the 1050 which is a considerable step away from a 1080.
> It also moves ahead of Nvidia’s Pascal based GTX 1050 3 GB
Based on https://www.techpowerup.com/gpu-specs/geforce-gtx-1650-mobil... this is about 40% faster than a GTX 1050, but also almost half the speed of a GTX 1080.
> With Strix Point, AMD’s mobile iGPU has a newer graphics architecture than its desktop counterparts. It’s an unprecedented situation, but not a surprising one. Since the DX11 era, AMD has never been able to take and hold the top spot in the discrete GPU market. Nvidia has been building giant chips where cost is no object for a long time, and they’re good at it. Perhaps AMD sees lower power gaming as a market segment where they can really excel. Strix Point seems to be a reflection of that.
Did AMD figure out that this market segment is underserved by NVidia? If so, good for them, laptops could use better GPUs.
It’s more than likely this is just a stronger play to get ahead of Intel in market share.
That’s a much more tangible competitor in that space.
Whether it means more games optimize for AMD as a side effect is tangential at best. Otherwise there’s no real reason to treat this as competing with NVIDIA. It’s an integrated GPU so it’s not moving any extra units.
If Snapdragon X Elite is a success, you can bet Nvidia will be producing laptop SoCs with passable CPUs and great iGPUs.
4k Aztec High GFX
* AMD 890M: 39.1fps
* M3: 51.8fps
3DMark Wild Life Extreme
* AMD 890M: 7623
* M3: 8286
Power:
* AMD 890M: 46w
* M3: 8286: 17w
M3 about ~253% more efficient.
But of course, if your goal is gaming, AMD's GPU will still be better because of Vulkan, DirectX, and Windows support. In pure architecture, AMD is quite a bit behind Apple.
Reducing the power of 890M to 17 W, the same as quoted for M3, would reduce the performance much less than the reduction in power consumption, improving the energy efficiency.
For a valid comparison of the energy efficiency, both systems must be configured for the same power consumption.
Moreover, by themselves those performance values do not prove that AMD is behind Apple in GPU architecture.
The better performance of the Apple GPU could be entirely caused by the much higher memory bandwidth and by the better CMOS process used for the Apple GPU.
For any conclusions about architecture, much more detailed tests would be needed, to separate the effects of the other differences that exist between these systems.
Actually, it's 253%. I made a mistake assuming the 890M was limited to 35w. It was actually 46w as measured by Notebookcheck.[0]
>Reducing the power of 890M to 17 W, the same as quoted for M3, would reduce the performance much less than the reduction in power consumption, improving the energy efficiency.
That depends. Sure, give almost any chip less power and it will be more efficient. I'm not arguing against that.
The problem with reducing power for the 890M is that it's already slower than the M3 by 26% while using 2.7x the power.
If you give the 890M 17w, yes, it will be more efficient than 46w. It just just be even slower than the M3.
>The better performance of the Apple GPU could be entirely caused by the much higher memory bandwidth
M3's bandwidth is 102.4 GB/s. AMD Strix Point uses LPDDR5X-7500 in dual channel mode so it should be around 120GB/s.
>and by the better CMOS process used for the Apple GPU.
AMD's Strix Point is manufactured on TSMC's N4P. M3 is on N3B, which is roughly 10% more power efficient than N4P. It doesn't explain the huge discrepancy in efficiency.
[0]https://www.notebookcheck.net/AMD-Zen-5-Strix-Point-iGPU-ana...
They come with the 780M and 680M processors, respectively, and both are outperformed by the 980M at a lower power draw [0]. Theoretically a consumer can't put these parts directly in a pc there's already a mini-pc with the laptop part 980M [1]. The 7800G sometimes shows up in mid-range and high-end gaming PCs with discrete graphics cards [2], which makes so little sense that I wonder if AMD quietly offloaded them in bulk at a steep discount to vendors.
I've commented on this before [3], can anyone shed light on the situation?
[0] https://www.anandtech.com/show/21485/the-amd-ryzen-ai-hx-370...
[1] https://www.tomshardware.com/desktops/mini-pcs/soyos-upcomin...
[2] https://www.tomshardware.com/desktops/gaming-pcs/hp-omen-35l...
Intel on the other hand were fairly sensible. i7 becomes Ultra 7 and the numbering restarts from 100 (Meteor Lake). That's easy to follow.