Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
Arrows of Time for Large Language Models
(opens in new tab)
(arxiv.org)
6 points
tianlong
2y ago
3 comments
Save
Share
3 comments
3 comments · 2 top-level
top
newest
oldest
nyoncore
2y ago
· 1 in thread
Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?
frotaur
2y ago
In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.
tianlong
OP
2y ago
There is a link with entropy creation?
j
/
k
navigate · click thread line to collapse