Their NorthPole chip doesn't look much different than the Groq LPU or Tenstorrent's hardware or even just AMD's NPU design. The tenstorrent cards have a pretty big amount of SRAM considering their price.
I am not an expert on this but reading Groq's description of their hardware it still has a compute/memory split. They make the memory super fast so it can fully feed the CPU without latency (80 terabytes second!). In the end is it much different than moving the ALU into memory like IBM is doing? The goal for both is to eliminate the memory bottleneck so there can be a variety of valid approaches.
How does Cerebras WSE-3 with 44GB of 'L2' on-chip SRAM compare to Google's TPUs, Tesla's TPUs, NorthPole, Groq LPU, Tenstorrent's, and AMD's NPU designs?