undefined | Better HN

0 pointsdelis-thumbs-7e1mo ago0 comments

Wouldn’t that be extremely computationaly expensive considering how resource incentive training is?

0 comments

2 comments · 1 top-level

colechristensen1mo ago· 1 in thread

No, training a state of the art model involves training on the order of 10 trillion tokens.

We're talking about a step that updates weights based on say between 10k and 1M tokens.

I learned something. Thank you!

j / k navigate · click thread line to collapse