You are also explicitly saying that you want device memory by specifying DEVICE_LOCAL_BIT. There's no difference.
> Likewise, your claim about UMA makes zero sense. Device malloc gets you a pointer or handle to device memory,
It makes zero sense to you because we're talking past each other. I am saying that on systems without UMA you _have_ to care where your resources live. You _have_ to be able to allocate both on host and device.
> Like, why is there gpu-only and device-local.
Because there's such a thing as accessing GPU memory from the host. Hence, you _have_ to specify explicitly that no, only the GPU will try to access this GPU-local memory. And if you request host-visible GPU-local memory, you might not get more than around 256 megs unless your target system has ReBAR.
> a theoretical vkMalloc should always give me device memory.
No, because if that's the only way to allocate memory, how are you going to allocate staging buffers for the CPU to write to? In general, you can't give the copy engine a random host pointer and have it go to town. So, okay now we're back to vkDeviceMalloc and vkHostMalloc. But wait, there's this whole thing about device-local and host visible, so should we add another function? What about write-combined memory? Cache coherency? This is how you end up with a zillion flags.
This is the reason I keep bringing UMA up but you keep brushing it off.