undefined | Better HN

0 pointsbarrkel3y ago0 comments

This is a fine, absolutely trivial, example. But LLMs are simply not all that.

IME GPT-4 can't write a bug-free 10 line shell script. It's particularly poor at inferring unstated requirements - or the need to elicit the same.

There's a general problem with LLMs: they're too eager to please. It shows up as confirmation bias. Embed a perspective in your prompt, and LLMs continue in the same vein.

You can, with careful prompting, try to provoke and prod the text generation into a more correct shape, but often it feels to me more like a game than productivity. I have to know the answer already to know how to ask the right questions and make the right corrections. So it feels like I'm supervising a child, and that I should be amazed it can do anything at all. And it is amazing; but for productivity outside tightly constrained environments (e.g. converting freeform dialogue into filling out a bureaucratic form - I think this is a close to ideal use case), I struggle to see it scaling up much, from what I've seen so far.

For creativity - e.g. making up a story for a child - it's not bad. One of my favourite use cases, after discovering how bad it is at writing code.

0 comments

1 comments · 1 top-level

byby3y ago

If you read my post I said 50 percent of the way there.

So that means a 10 line bash script at best would have 5 lines of bugs.

But the AI is actually better then that. You've just increased the bar.

j / k navigate · click thread line to collapse