Ask HN: How are you getting reliable code-gen performance out of LLMs?

2 points_false1y ago1 comments

I'm particularly interested in people using LLM APIs, where code is consumed programmatically.

I've been using LLMs a lot lately to generate code, and code quality is a mixed bag. Sometimes it will run straight out of the box or with a few manual tweaks, and others it just straight up won't compile. Keen to hear what workarounds others have used to solve this (e.g. re-prompting, constraining generations, etc).

1 comments

1 comments · 1 top-level

ilaksh1y ago

Your post would make more sense to me if you were specific about the models. It's like if you were asking about how to get reliable transportation from a car and didn't specify which model of cars you were considering.

o1-preview seems to be a step up from Claude 3.5 Sonnet.

There are many open source coding LLMs that for complex tasks will be a joke compared to the SOTA closed ones.

I think that there are two strategies that can work: 1) constrain the domain to a particular framework and provide good documentation and examples in the prompts for it, and 2) create an error-correcting feedback loop where compilation/static analysis and runtime errors or failed tests are fed back to the model automatically.

j / k navigate · click thread line to collapse