There is more weirdness here than just SMT.
The full crossbar, allowing each shader to individually issue a fetch from memory. The shared memory space is not like cache but instead is a shader-to-shader communication scratchpad.
Atomics support, coalescing atomics together.
-------
I mean hell: what is a core? Do remember that on SMs, every single shader (not SM) has its own instruction pointer.
Is the shader a core? No, not really. But SMs aren't a core either.
I wouldn't compare GPU and CPU architecture at all. They're just different. What I did above, breaking both down into individual multipliers then counting them seems like the best way forward, especially as we remain multiplier bound in practice.