People say this, but honestly, it's not really my experience— I've given ChatGPT (and Copilot) genuinely novel coding challenges and they do a very decent job at synthesizing a new thought based on relating it to disparate source examples. Really not that dissimilar to how a human thinks about these things.
Many (but not all) coding tasks fall into this category. "Connect to API A using language B and library C, while integrating with D on the backend." Which is really cool!
But there's other coding tasks that it just can't really do. E.g, I'm building a database with some novel approaches to query optimization and LLMs are totally lost in that part of the code.
And an LLM could very much ingest such a paper and then, I expect, also understand how the concepts mapped to the source code implementing them.
LLM don't learn from manuals describing how things works, LLM learn from examples. So a thing being described doesn't let the LLM perform that thing, the LLM needs to have seen a lot of examples of that thing being perform in text in able to perform it.
This is a fundamental part to how LLM work and you can't get around this without totally changing how they train.
I'm hardly an expert, but it seems intuitive to me that even if a problem isn't explicitly accounted for in publicly available training data, many underlying partial solutions to similar problems may be, and an LLM amalgamating that data could very well produce something that appears to be "synthesizing a new thought".
Essentially instead of regurgitating an existing solution, it regurgitates everything around said solution with a thin conceptual lattice holding it together.
Sort of related to how you need to specify the level of LLM reasoning not just to control cost, but because the non-reasoning model just goes ahead and answers incorrectly, and the reasoning model will "overreason" on simple problems. Being able to estimate the reasoning-intensiveness of a problem before solving it is a big part of human intelligence (and IIRC is common to all great apes). I don't think LLMs are really able to do this, except via case-by-case RLHF whack-a-mole.
I think rather it has a broad understanding of concepts like build systems and tools, DAGs, dependencies, lockfiles, caching, and so on, and so it can understand my system through the general lens of what makes sense when these concepts are applied to non-ROS systems or on non-GHA DevOps platforms, or with other packaging regimes.
I'd argue that that's novel, but as I said in the GP, the more important thing is that it's also how a human approaches things that to them are novel— by breaking them down, and identifying the mental shortcuts enabled by abstracting over familiar patterns.
And yet it can do it when presented with a language spec. It's not perfect, but it can solve that with tooling that it makes for itself. For example, it tends to generate B code that is mostly correct, but with occasional problem. So, I had it write a B parser in Python and then use that whenever it edits B code to validate the edits.