32 SIMD registers holding 512 bits each. Ability to process 1024 bits (128 bytes, and even more in some scenarios) in a single clock cycle. Sounds more and more like a GPU to me.