undefined | Better HN

0 pointstayo422y ago0 comments

This is a little confusing. You turned the text into indices? So numbers? Then compressed that? Or the text as numbers without any extra compression is only 1kb?

The tokenizer the models use,(sentence piece) is more or less based on one way to do compression.(bpe). It's not really clear what your testing.

0 comments

2 comments · 1 top-level

daemonologist2y ago· 1 in thread

My reading is that at each generation step they ordered all possible next words by the probability assigned to them by the model and recorded the index of the true next word (so if the model was very good at predicting Harry Potter their indices would mostly be 0, 0, 0, ...).

aimor2y ago

This is correct

j / k navigate · click thread line to collapse