I believe that the BitGrid, a computing fabric consisting of a homogeneous grid of cells with 4 bits of input (one
from each neighbor) and 4 output bits (one
to each neighbor) and a look up table (LUT) can outperform anything NVidia is cooking up, in AI related tasks, for less power, less silicon, and can handle defects much better.
If you're going to spend Billions in chips to go into a data center just to do matrix multiplies... maybe spend a little bit to see if this might work better?