This is promising as the future of modeling. Outperforming transformers doesn’t catch my eye because there is so much variation in performance based on size, training data, training method etc.
However the greater (5x) inference bandwidth makes this super appealing especially for democratizing AI and enabling the GPU poor. This could very well be a watershed moment for SSMs, similar to how Transformers boasted improvements to training and inference speed in 2019.