so far
Without the thousands of micro decisions that go into building even the simplest of solutions, it doesn't matter how large your context window is. It's not about holding a system in your mind - it's about what you do with it, what decisions you make to move towards your goal.
At least that's my take on current LLMs and their limitations.
Enterprise software gets quite complex, has a ton of dependencies that need to be understood together, etc.
Just the data model can take up hundreds of thousands of tokens describing the tables and relationships in some of the code bases I've worked on.
These models degrade like crazy at those long token counts though. I have not found them useful if I need to just stuff everything in a giant context window. I'm mostly using Claude though, so slightly different context scale.
This is one of those things which superficially seems like a slam-dunk gotcha, but isn't.
Yes, correct, I can't do that.
Unfortunately, my experience with LLMs is that they can't really pay attention to all the things in the context window either.
Even a mere 5,613 tokens[0] had it getting confused.
If any AI could really do two million tokens with perfect recall of the problem, that would indeed be wildly super-human. Even just having a 6k tokens worth of custom instructions that are applied consistently to an ongoing data stream — which I bet could be done with the right scaffolding on the API of better models, even if not a naïve use of chat UIs — is superhuman. That kind of ongoing focus and persistence would still be superhuman even when the quality of the result is "ok, not great, just ok", owing to how "human doing same thing for 4 hours" is much worse than "freshly rested human begins work for the day".
I don't know where the boundary really is, though, where it becomes superhuman on any axis[1]. The failure mode I'm describing here reminds me of art lessons towards the end of my time at school, where the teacher had to remind people that accurate still-life studies required you to keep looking again and again at the material, not just once when you started and then filling in the details entirely from your imagination.
[0] I tried using it to translate this Wikipedia page to English, and it was hallucinating plausible but false things by the time it got to the timeline: https://de.wikipedia.org/wiki/Döberitzer_Heide
Here's the chat link, even repeated prompting didn't work for the timeline: https://chatgpt.com/share/67b08f40-2cb8-8011-acd8-fc5b9d6fa8...
Tried it again while writing this to see if current models are any better, this time the same prompt when to the canvas editor, it didn't complete the translation, when I replied "continue" it replaced the attempted first half with a non-translated German wikipedia article that was essentially unrelated: https://chatgpt.com/canvas/shared/67b091d021088191bd9e0ca7c3...
[1] and there are many different aspects of intelligence.
Consider the converse: if it was a fundamental requirement of the nature of intelligence that all aspects of human intelligence correlate well with each other, then chess AI could only have beat world champions like Kasparov in the same year that Go AI beat those like Lee Sedol.
And then someone probably pipes up to tell me that cross-browser margins are actually more complicated than "AI"!
It's fixed now. Thanks for commenting and letting me know.