> All that stuff is not just relevant to tensorflow / pytorch stuff but also databases.
Yes! and that's the beauty of it. It is not an accelerator, these are fully generic cores.
Not equivalent to 'smart' Intel cores with all the branch prediction, prefetching and caching magic; but with massive computation capabilities nonetheless.
GPUs do have massive amounts of memory (both in RAM and registers), but you have to have preloaded your stuff into it beforehand. And what you can actually do efficiently are SIMD operations.
I'd liken PIM to a better GPU-CPU blend: you get to keep your CPU doing its things with massive parallel operations concurrently. Also, these seem to be mostly independent cores, so you would not be limited to SIMD.
Let's bet: in 10 years, AWS will have a new offering: the 'nano lambda'. You get a PIM core share, with 10 MB local 'persistent' RAM (keeping your data + a continuation of your code when it is not running), running your tiny Loom thread [1], at the edge, billed at 1us granularity, only when it is running, and for 0.0000000000000001 USD per us.
[1] https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.ht...