undefined | Better HN

0 pointsheisenburgzero2y ago0 comments

I think some earlier NLP applications have something called "Unknown token", which they will replace any unseen word. But for recent implementations, I don't think they are being used anymore.

It still baffles me why such stochastic parrot / next token predictor, will recognize these "Unseen combinations of tokens" and reuse them in response.

0 comments

3 comments · 2 top-level

stormfather2y ago· 1 in thread

This helped me understand but not well enough to explain it yet: https://transformer-circuits.pub/2022/in-context-learning-an...

heisenburgzeroOP2y ago

Thanks. "In-context learning" was the phrase I was looking for.

Also, I found these 2 links pretty good too. 1. http://ai.stanford.edu/blog/understanding-incontext/ 2. http://ai.stanford.edu/blog/in-context-learning/

I'm still not completely convinced. Probably need to dwell on the topic longer.

stevenhuang2y ago

Everything falls into place once you understand that LLMs are indeed learning hierarchical concepts inherent in the structured data it has been trained on. These concepts exist in a high dimensional latent space. Within this space is the concept of nonsense/gibberish/placeholder, which your sequence of unseen tokens map to. Then it combines this with the concept of SQL tables, resulting in hopefully the intended answer.

j / k navigate · click thread line to collapse