There's nothing conceptually hard but it's really a lot of work. In addition to the items you listed there's the actual compute kernels or compiler to generate those, and then porting frameworks over (PyTorch etc), and then doing the level of testing, documentation, and ongoing maintenance to make an alternative platform a reasonable idea for end users. The pitch for buying NVIDIA hardware is that existing tools, example code, and third party research will more or less work and perform well out of the box.
Edit: Going back to your original question, the main thing that makes CUDA so special is NVIDIA has already poured billions of dollars into all of this infrastructure and credibly will keep doing so.