Do you have papers to back this up ? That was also my reaction when i saw some really crazy accurate comments on some vibe coded piece of code, but i couldn't prove it, and thinking about it now i think my intuition was wrong (ie : LLMs do produce original complex code).
If that does not work then the moment you introduce AI you cap their capabilities unless humans continue to create original works to feed the AI. The conclusion - to me, at least - is that these pieces of software regurgitate their inputs, they are effectively whitewashing plagiarism, or, alternatively, their ability to generate new content is capped by some arbitrary limit relative to the inputs.
Neural networks can at best uncover latent correlations that were already available in the inputs. Expecting anything more is basically just wishful thinking.
If so, I'm not sure it's a useful framing.
For novel writing, sure, I would not expect much truly interesting progress from LLMs without human input because fundamentally they are unable to have human experiences, and novels are a shadow or projection of that.
But in math – and a lot of programming – the "world" is chiefly symbolic. The whole game is searching the space for new and useful arrangements. You don’t need to create new information in an information-theoretic sense for that. Even for the non-symbolic side (say diagnosing a network issue) of computing, AIs can interact with things almost as directly as we can by running commands so they are not fundamentally disadvantaged in terms of "closing the loop" with reality or conducting experiments.
Modern LLMs are trained by reinforcement learning where they try to solve a coding problem and receive a reward if it succeeds.
Data Processing Inequalities (from your link) aren't relevant: the model is learning from the reinforcement signal, not from human-written code.
We all stand on the shoulders of giants and learn by looking at others’ solutions.
To me that's proof positive they know their output is mangled inputs, they need that originality otherwise they will sooner or later drown in nonsense and noise. It's essentially a very complex game of Chinese whispers.
(I created a template language for JSON and added branching and conditionals and realized I had a whole programming language. Really proud of my originality until i was reading Ted Nelson's Computer Lib/Dream Machines and found out I reinvented TRAC, and to some extent, XSLT. Anyway LLMs are very good at reasoning about it because it can be constrained by a JSON schema. People who think LLMs only regurgitate haven't given it a fair shot)
Perhaps the occasional program that relies heavily on precise visual alignment will fail - but I dare say if we give the LLM the same grace we'd give a visually impaired designer, it can do exactly as well.
It failed massively, spitting out garbage code, where the comments claimed to use blocking access patterns, but the code did not actually use them at all.
LLMs are, frankly, nearly useless for programming. They may solve a problem every once in a while, but once you look at the code, you notice it's either directly plagiarized or bad quality (or both, I suppose, in the latter case).
Is there something unique about code, that is different from language (or images), that would make it impossible for an LLM to produce original code? I don't believe so, but I'm willing to be convinced.
I think this switches the burden of proof: we know LLMs can produce original content in other contexts. Why would they not be able to create original code?
[0] Ever curious, I tested this assumption. I got Claude to write an original limerick about goats oiling their beards with olive oil, which was the first reasonable thing I could think of as a suitably niche subject. I googled the result and could not find anything close to it. I then asked it to produce another limerick on the same subject, and it produced a different limerick, so obviously not just repeating training data.
[1] https://www.oneusefulthing.org/p/the-recent-history-of-ai-in...
Yes, it is true that a lot of humans remix existing code. But not all. It has yet to be proven that any LLM is doing something more than remixing code.
I would submit as evidence to this idea (LLMs are not capable of writing original code) the fact that not a single company using LLM-based AI coding has developed a novel product that has outpaced its competition. In any category. If AI really makes people "10x" more productive, then companies that adopted AI a year ago should be 10 years ahead of their competition. Substitute any value N > 1 you want and you won't see it. Indeed, given the stories we're seeing of the massive amounts of waste that is occurring within AI startups and companies adopting AI, it would suggest that N < 1.
I recently asked Gemini 3 Pro to create an RSS feed reader type of experience by using XSLT to style and layout an OPML file. I specifically wanted it to use a server-side proxy for CORS, pass through caching headers in the proxy to leverage standard HTTP caching, and I needed all feed entries for any feed in the OPML to be combined into a single chronological feed.
It initially told multiple times that it wasn't possible (it also reminded me that Google is getting rid of XSLT). Regardless, after reiterating that it is possible multiple times it finally decided to make a temporary POC. That POC worked on the first try, with only one follow up to standardize date formatting with support for Atom and RSS.
I obviously can't say the code was novel, though I would be a bit surprised if it trained on that task enough for it to remember roughly the full implementation and still claimed it was impossible.
Yes, and Shakespeare merely copied the existing 26 letters of the English alphabet. What magical process do you think students are using when they read and re-combine learned examples to solve assignments?
It reproduces _patterns from the training data_, sometimes including verbatim phrases.
The work (to discover those patterns, to figure out what works and what does not, to debug some obscure heisenbug and write a blog post about it, ...) was done by humans. Those humans should be compensated for their work, not owners of mega-corporations who found a loophole in copyright.
What about humans? Are humans capable of producing completely original code or ideas or thoughts?
As the saying goes, if you want to create something from scratch, you have to start by inventing the universe.
Human mind works by noticing patterns and applying them in different contexts.