Since BERT came out there is a considerable literature of people struggling mightily to combine transformer representations of document parts into a whole that convinces me that one could spend a few lifetimes pushing a bubble around underneath that rug.
I think the best argument for your case is that people seem to get along just fine with a limited short term memory. I'd temper that with the observation that a person writing a summary is actually doing a multiple stage process in which their short term memory is attending to part of what they are writing, part of what they are reading, and they are building long term memory structures at the same time. So there is a lot going on.
In the sense that abstracts work well for information retrieval and that many of them would fit in the GPT attention window or only be a little bigger you could make the case that a fixed-size structure could be highly useful for IR.
On the other hand, many documents, such as scientific papers, are considerably bigger than the current attention window and direct summarization of a single document via transformer will still need a bigger window, more like 40,000 tokens.
A lot of things in the literature are complex, muddy, contradictory or all of the above. (Try a question like "What did Freud think about narcissism?" or "What is the clinical relevance of Bleuler's concept of ambivalence?" or "Tell me about cosmic inflation" or "What is the dark matter particle?")
Hard cases really do require matching up parts of document A with parts of document B and certainly having them in the same attention window would help an LLM do that in a natural way.
It might be completely impractical, not just because of computational scalability but possibly more fundamental scalability limits. (I'm not sure a person with a 10x bigger short term memory would really be able to solve problems better than the average person... There are transformers with a 500,000 token attention window today and they suck.)
There could be some procedure where you cut documents up into pieces in various ways, extract critical context from documents A and B and other literature and also put in the parts you want to critique against each other, or even match up different parts of the same document to do the same. Maybe a small attention window could still be used to decompose documents into knowledge graphs but it is by no means trivial to reason over a KG once you have it.
What I do know today is that I have documents >4096 tokens that I want to retrieve, cluster and classify right now and transformers were not up to the task in Feb 2023, and I am hoping for some progress soon that will help.