And I'm not talking about gimmicks like RTX, I'm talking about all these cool use cases around ML and DL like background noise cancelling, video upscaling, camera eye contact real-time deepfakes and now this. And that's if you ignore all the mind-blowing research papers put out by Nvidia which aren't featured in consumer apps yet.
This is Nvidia's biggest moat and AMD isn't even in the race here and for some reason Lisa Su seems to not give enough of a shit to compete.
I hate Nvidia for their price gouging and anti-consumer practices, but at least they haven't gotten complacent and are innovating on all fronts to keep pushing the envelope. Massive respect for the tech leadership at Nvidia.
I can't even blame Nvidia, of course they're gonna do what's best for them, and it has worked. I blame AMD for completely dropping the ball on the GPU compute segment, and I blame users for preferring Cuda libraries instead of OpenCL.
I hope Intel and maybe AMD can get the GPGPU market to something that resembles something open and most importantly interoperable. But Nvidia has a big head start.
Enough people that the Intel standard would lose out and you'd need to ask the reverse question? (See IA64 vs. AMD64.)
But if you mean, “what if there was a major microprocessor line incompatible with AMD64”, well, its called ARM and lots of people buy it.
Whenever use of AMD GPUs for ML comes up on HN I echo your points with the added personal experiences (PAIN) I've had trying to actually use an AMD GPU for anything other than driving a display.
In terms of the tech leadership at Nvidia, all you need to do is peek at the mind-blowing number of repos they have on Github - literally hundreds of component software pieces across every layer that at this point can do anything from drastic performance increases to completely unique (CUDA only of course) functionality.
On HN especially the whole "proprietary driver on Linux desktop situation" has hurt Nvidia significantly in terms of hearts and minds. As I said, then you look at almost any other software provided by Nvidia and realize they're actually a huge champion and supporter of open source for just about every aspect of the ecosystem other than the driver itself - and they're working on open sourcing the driver as well.
AMD occupies a weird space in GPU compute - there are massive HPC deployments of AMD GPUs. Presumably they only work because AMD is throwing a ton of essentially one-off support at them for deployment. On the other end you have their "support" for low to mid-range GPU compute. I say "support" with quotes because you realize very quickly it's absolutely pathetic to the point of useless and run back screaming to Nvidia/CUDA.
I'm not an Nvidia fanboy but my opinion at this point is the hardware markups for Nvidia GPU essentially subsidize all of the incredible (largely open source) software and ecosystem support they provide. Yes, they engage in anti-competitive practices but please show me a large corporation that doesn't. Fact is Nvidia has invested a massive amount of resources for well over a decade to earn their dominance of GPU compute.
Spending twice as much (somewhat factual but overblown popular opinion on HN) on Nvidia hardware becomes an obvious choice when you realize you're going to burn A TON of time (and still fail) trying to get AMD GPU hardware to actually do anything in ML.
Nvidia has dominated the market for a long time, and unlike Intel they up the prices and wisely spent enough of it on R&D, Nvidia is just reaping the rewards for it.
We wanted to use MI50s at work because it was promised they can do SRIOV, but we never got any further with AMD support than "it should work". They took ages to respond, and could not tell what was wrong from the extensive logs and hwinfo we provided them with.
Also the PCI reset bug that plagued multiple generations. There's a guy maintaining a kernel module that works around that issue in a whacky way. According to his research and reverse engineering, AMD could fix that with a firmware update to those cards. Even got in contact with AMD engineers briefly and outlined what the problem was. Then radio silence, and a couple months later AMD added a very similar workaround in their kernel module, the amdgpu driver. It's just that a fix in there doesn't make any sense whatsoever, because you need that fix when you do PCI passthrough, in which case you explicitly do not load the amdgpu module, as you don't use the GPU on the host machine but, well, pass it through to the VM.
AMD might be right not to put effort into manually optimizing for the old approach.
culito, noun. Diminuitive of culo.
culo, noun. Slang for arse.
https://en.wiktionary.org/wiki/culito https://en.wiktionary.org/wiki/culo- the computation output depends only on local features (my guess)
- most transistors look the same, so you can cache these results heavily
- the same holds for the interconnect layers
This is not correct: the features on the mask are larger than the features required on the target, which is the whole problem, so you need a diffraction pattern over a large area to produce a target feature. But then you need to overlap with the diffraction pattern of the next feature, and so on. I suspect that every pixel on the output gets determined by almost every pixel on the input, which is why it takes so much computation in the first place.
> Even a change to the thickness of a material can lead to the need for a new set of photomasks
You can imagine the light spreading out through the other side of the mask, diffracting and interfering with neighbouring features through some radius. That it's not trivial to compute suggests the number of combinations within this radius is large enough to not be highly cacheable.
I like how inverse lithography and neural network backpropagation were both techniques introduced in the 1980s and now we are finally seeing them both come to life, so to speak, with our sufficiently advanced GPUs.