undefined | Better HN

0 pointsfauigerzigerk3mo ago0 comments

I'm not opposed to AI generated code in principle.

I'm just saying that we don't know how much effort was put into making this and we don't know whether it works.

The existence of a repository containing hundereds of files, thousands of SLOCs and a folder full of tests tells us less today than it used to.

There's one thing in particular that I find quite astonishing sometimes. I don't know about this particular project, but some people use LLMs to generate both the implementation and the test cases.

What does that mean? The test cases are supposed to be the formal specification of our requirements. If we do not specify formally what we expect a tool to do, how do we know whether the tool has done what we expected, including in edge cases?

0 comments

teiferer3mo ago

I fully agree with your overall message and sentiment. But let me be nit-picky for a moment.

> The test cases are supposed to be the formal specification of our requirements

Formal methods folks would strongly disagree with this statement. Tests are informal specifications in the sense that they don't provide a formal (mathematically rigorous) description of the full expected behavior of the system. Instead, they offer a mere glimpse into what we hope the system would do.

And that's an important part, which is where your main point stands. The test is what confirms that the thing the LLM built conforms to the cases the human expected to behave in a certain way. That's why the human needs to provide them.

(The human could take help of an LLM to write the tests, as in they give an even-more-informal natural language description of what the test should do. But the human then needs to make sure that the test really does that and maybe fill in some gaps.)

halfcat3mo ago

> If we do not specify formally what we expect a tool to do, how do we know whether the tool has done what we expected, including in edge cases?

You don’t. That’s the scary part. Up until now, this was somewhat solved by injecting artificial friction. A bank that takes 5 days for a payment to clear. And so on.

But it’s worse than this, because most problems software solves cannot even be understood until you partially solve the problem. It’s the trying and failing that reveals the gap, usually by someone who only recognizes the gap because they were once embarrassed by it, and what they hear rhymes with their pain. AI doesn’t interface with physical reality, as far as we know, or have any mechanism to course correct like embarrassment or pain.

In the future, we will have flown off the cliff before we even know there was a problem. We will be on a space ship going so fast that we can’t see the asteroid until it’s too la...

j / k navigate · click thread line to collapse