Arrows of Time for Large Language Models (opens in new tab)

(arxiv.org)

6 pointstianlong2y ago3 comments

3 comments

3 comments · 2 top-level

nyoncore2y ago· 1 in thread

Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?

In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.

tianlongOP2y ago

There is a link with entropy creation?

j / k navigate · click thread line to collapse