Mamba: New SSM arch with linear-time scaling that outperforms Transformers (opens in new tab)

(github.com)

6 pointsxenova2y ago2 comments

2 comments

2 comments · 2 top-level

This is promising as the future of modeling. Outperforming transformers doesn’t catch my eye because there is so much variation in performance based on size, training data, training method etc.

However the greater (5x) inference bandwidth makes this super appealing especially for democratizing AI and enabling the GPU poor. This could very well be a watershed moment for SSMs, similar to how Transformers boasted improvements to training and inference speed in 2019.

xenovaOP2y ago

Paper: https://arxiv.org/abs/2312.00752 Models: https://huggingface.co/state-spaces

j / k navigate · click thread line to collapse