GPUs are an evolving target. New GPUs have tensor cores and support all kinds of interesting numeric formats, older GPUs don't support any of the formats that AI workloads are using today (e.g. BF16, int4, all the various smaller FP types).
NPU will be more efficient because it is much less general an GPU and doesn't have any gates for graphics. However, it is also fairly restricted. Cloud hardware is orders of magnitude faster (due to much higher compute resources I/O bandwidth), e.g. https://cloud.google.com/tpu/docs/v6e.
This unfortunate naming has sown plenty of confusion around DeepSeek's quality and resource requirements. Actual DeepSeek v3/R1 continues to require at least ~100GB of VRAM/Mem/SSD, and this does not change that.