undefined | Better HN

0 pointsch4s31y ago0 comments

It seems like LLMs made really big strides for a while but don't seem to be getting better recently, and in some ways recent models feel a bit worse. I'm seeing some good results generating test code, and some really bad results when people go to far with LLM use on new feature work. Base on what I've seen it seems like spinning up new projects and very basic features for web apps works really well, but that doesn't seem to generalize to refactoring or adding new features to big/old code bases.

I've seen Claude and ChatGPT happily hallucinate whole APIs for D3 on multiple occasions, which should be really well represented in the training sets.

0 comments

3 comments · 3 top-level

soerxpso1y ago

> hallucinate whole APIs for D3 on multiple occasions, which should be really well represented in the training sets

With many existing systems, you can pull documentation into context pretty quickly to prevent the hallucination of APIs. In the near future it's obvious how that could be done automatically. I put my engine on the ground, ran it and it didn't even go anywhere; Ford will never beat horses.

1 more reply

empath751y ago

the LLM's themselves are making marginal gains, but the tools for using LLMs productively are getting so much better.

1 more reply

oconnor6631y ago

> don't seem to be getting better recently

o3 came out just one month ago. Have you been using it? Subjectively, the gap between o3 and everything before it feels like the biggest gap I've seen since ChatGPT originally came out.

1 more reply

j / k navigate · click thread line to collapse