undefined | Better HN

0 pointsvisarga3y ago0 comments

It doesn't memorize training data except for very repetitive bits. It needs a fresh copy of the paper in the prompt but only 10% fits.

0 comments

2 comments · 1 top-level

sebzim45003y ago· 1 in thread

Making the context 10x longer would 'only' cost about 100x as much compute. Presumably you could take GPT-3 (trained with the normal context) and then finetune it on a comparitively small amount of data with the new context length, so it shouldn't be enormously expensive to train either.

Would be interesting to see how well it could 'understand' a paper, or if more layers etc. would be needed.

visargaOP3y ago

Well, it depends. Perceiver can handle big context.

https://arxiv.org/abs/2103.03206

j / k navigate · click thread line to collapse