undefined | Better HN

0 pointsXenophileJKO1y ago0 comments

Likely the issue is how you are asking the model to process things. The primary limitation is the amount of information (or really attention) they can keep in flight at any given moment.

This generally means for a task like you are doing, you need to have sign posts in the data like minute markers or something that it can process serially.

This means there are operations that are VERY HARD for the model like ranking/sorting. This requires the model to attend to everything to find the next biggest item, etc. It is very hard for the models currrently.

0 comments

3 comments · 1 top-level

anon2911y ago· 2 in thread

> This means there are operations that are VERY HARD for the model like ranking/sorting. This requires the model to attend to everything to find the next biggest item, etc. It is very hard for the models currrently.

Ranking / sorting is O(n log n) no matter what. Given that a transformer runs in constant time before we 'force' it to output an answer, there must be an M such that beyond that length it cannot reliably sort a list. This MUST be the case and can only be solved by running the model some indeterminate number of times, but I don't believe we currently have any architecture to do that.

Note that humans have the same limitation. If you give humans a time limit, there is a maximum number of things they will be able to sort reliably in that time.

christianqchung1y ago

Transformers absolutely do not run in constant time by any reasonable definition, no matter what your point is.

anon2911y ago

They absolutely do given a sequence size. All models have max context lengths. Thus bounded by a constant

j / k navigate · click thread line to collapse