undefined | Better HN

0 pointsquantadev1y ago0 comments

> I don't expect them to replace transformers until they make some hypothetic breakthrough

Yes, a breakthrough that does what Self-Attention is doing, rather than just scaling up.

0 comments

No, that's what you're missing from the beginning: the breakthrough of transformers was scalability. Now we have other models that are equally scalable and as such roughly equally performant (and that's not a surprise).

But the ship has sailed and nobody is gonna switch to something else than transformers if it's not significantly better, and as such the other approaches are going to stay behind because every marginal improvement come to transformers first (because that's what practically everyone is working on) and alternative models are playing catch-up.

This is a remarkable example of path dependence.

Interpreting this as “transformers are fundamentally superior” is the mistake I'm trying to help you correct.

The breakthrough of transformers was scalability. The next breakthrough of equivalent importance will be entirely different or it won't be.

quantadevOP1y ago

Scalability wasn't an architectural breakthrough. It was merely a discovery..

littlestymaar1y ago

How are these words even in contradiction to each other?

1 more reply

j / k navigate · click thread line to collapse

0 comments

littlestymaar1y ago

This is a remarkable example of path dependence.

Interpreting this as “transformers are fundamentally superior” is the mistake I'm trying to help you correct.

The breakthrough of transformers was scalability. The next breakthrough of equivalent importance will be entirely different or it won't be.

quantadevOP1y ago

Scalability wasn't an architectural breakthrough. It was merely a discovery..

littlestymaar1y ago

How are these words even in contradiction to each other?

1 more reply

j / k navigate · click thread line to collapse