I feel like this is the problem of do we send a spaceship on a hundred year journey or spend a decade trying to make a spaceship that could do the trip in fifty years.
The rate of improvements seems quick enough right now that if you started training a huge model now that you might regret spending all that money on an architecture that is obsolete by the time you are finished.
That said, if you keep waiting you never get around to it.
It would be nice to see a large parameter mamba family data point though.