What gets pushed out isn’t the last version of the document itself (since it’s FIFO), but the important parts of the conversation—things like the rationale, requirements, or any context the model needs to understand why it’s making changes. So, instead of being helpful, that extra capacity just gets filled with old, repetitive chunks that have to be processed every time, muddying up the output. This isn’t just an issue with code; it happens with any kind of document editing where you’re going back and forth, trying to refine the result.
Sometimes I feel the way to "resolve" this is to instead go back and edit some earlier portion of the chat to update it with the "new requirements" that I didn't even know I had until I walked down some rabbit hole. What I end up with is almost like a threaded conversation with the LLM. Like, I sometimes wish these LLM chatbots explicitly treated the conversion as if it were threaded. They do support basically my use case by letting you toggle between different edits to your prompts, but it is pretty limited and you cannot go back and edit things if you do some operations (eg: attach a file).
Speaking of context, it's also hard to know what things like ChatGPT add to it's context in the first place. Many of times I'll attach a file or something and discover it didn't "read" the file into it's context. Or I'll watch it fire up a python program it writes that does nothing but echo the file into it's context.
I think there is still a lot of untapped potential in strategically manipulating what gets placed into the context window at all. For example only present the LLM with the latest and greatest of a document and not all the previous revisions in the thread.
Here are the docs for an example of how it can look: https://news.ycombinator.com/item?id=42039895