> You can use a trivial experiment to verify that you can't keep details in your sliding attention window for more than a few seconds or focus on more than a few things simultaneously.
I said "context window" not "attention window." Of course spans of attention are limited. Knowledge is not. Knowledge is often highly specific.
> You're possibly mixing it up
Not really. You've simply failed to verify your understanding of my argument and instead created something of a strawman.
> You aren't keeping the entire codebase in your memory, just its highly processed and conceptualized version
And what is your basis for this claim? Why a codebase? You don't think I can't remember an entire function? Yet actors can remember entire sets of lines for a scene? Is that just a highly processed and abstracted version in their minds? And they just run some cognitive loop to recreate dialog in real time?
> they need stronger processing of what they already remember.
What is "stronger?" More time? More memory? More compute? And how is that put to use? Why is it, when given certain prompts, that LLMs reproduce 100s of pages of directly copied and copyrighted work?
> Models can be trained better to cram more intelligence into the same amount of parameters, that's what I mean.
Cool. _How_? What is the limit of this training? How efficient is it? How many resources do you need on the input for a given increase in output? Otherwise it's just a ton of hand waving going on here.