Making the context 10x longer would 'only' cost about 100x as much compute. Presumably you could take GPT-3 (trained with the normal context) and then finetune it on a comparitively small amount of data with the new context length, so it shouldn't be enormously expensive to train either.
Would be interesting to see how well it could 'understand' a paper, or if more layers etc. would be needed.