1
Ask HN: Transformer alternatives that could have emergent properties when scaled
I am trying to identify model architecture candidates that could, like transformers, have "emergent" properties when they are scaled (see https://arxiv.org/abs/2206.07682).
Some contenders I already know about are:
* Monarch Mixer (https://arxiv.org/pdf/2111.00396.pdf)
* Hyena (https://hazyresearch.stanford.edu/blog/2023-03-07-hyena)
Thanks for your help.