As I mentioned in one of the footnotes in the post:
> People often tell me "you would get better results if you generated code in a more mainstream language rather than Haskell" to which I reply: if the agent has difficulty generating Haskell code then that suggests agents aren't capable of reliably generalizing beyond their training data.
If an agent can't consistently apply concepts learned in one language to generate code in another language, then that calls into question how good they are at reliably permuting the training dataset in the way you just suggested.
Pick a good model, let it choose its own tools and then re-evaluate.
doesn't that apply to flesh-and-bone developers? ask someone who's only working in python to implement their current project in haskell and I'm not so sure you'll get very satisfying results.
No, it does not. If you have a developer that knows C++, Java, Haskell, etc. and you ask that developer to re-implement something from one language to another the result will be good. That is because a developer knows how to generalize from one language (e.g. C++) and then write something concrete in the other (e.g. Haskell).
In my experience, a software engineer knows how to program and has experience in multiple languages. Someone with that level of experience tends to pick up new languages very quickly because they can apply the same abstract concepts and algorithms.
If an LLM that has a similar (or broader) data set of languages cannot generalise to an unknown language, then it stands to reason that it is indeed only capable of reproducing what’s already in its training data.
Yes? If they could, we would have a strong general intelligence by now and only few people are claiming this.
I think you’re conflating software and product.
A product can be a recombination of standard software components and yet be something completely new.
This is very true for an email client, but very untrue for an innovative 3D rendering engine technology (just an example).
In the past I have had people here suggest they're just writing boilerplate CRUD software, and I've suggested that means they could just use low code tools instead. They then suggest it's too complex for that to work.
I think we tend to view ourselves as just hooking together basic operations, which might be technically true, but that becomes complex very quickly. A product can be built off of straight forward REST and database operations, but take you months of learning to get up to speed on.