undefined | Better HN

0 pointsrossant8mo ago0 comments

It was a fairly big refactoring basically converting a working static HTML landing page into a Hugo website, splitting the HTML into multiple Hugo templates. I admit I was quite in a hurry and had to take shortcuts. I didn't have time to write automated tests and had to rely on manual tests for this single webpage. The diff was fairly big. It just didn't occur to me that the URLs would go through the LLMs and could be affected! Lesson learnt haha.

0 comments

7 comments · 4 top-level

cimi_8mo ago· 3 in thread

Speaking of agents and tests, here's a fun one I had the other day: while refactoring a large code base I told the agent to do something precise to a specific module, refactor with the new change, then ensure the tests are passing.

The test suite is slow and has many moving parts; the tests I asked it to run take ~5 minutes. The thing decided to kill the test run, then it made up another command it said was the 'tests' so when I looked at the agent console in the IDE everything seemed fine collapsed, i.e. 'Tests ran successfully'.

Obviously the code changes also had a subtle bug that I only saw when pushing its refactoring to CI (and more waiting). At least there were tests to catch the problem.

rossantOP8mo ago

So it took a shortcut as it was too lazy and it lied to your face about it. AGI is here for good.

tuesdaynight8mo ago

I think that it's something that model providers don't want to fix, because the amount of times that Claude Code just decided to delete tests that were not passing before I added a memory saying that it would need to ask for my permission to do that was staggering. It stopped happening after the memory, so I believe that it could be easily fixed by a system prompt.

Ezhik8mo ago

Your Claude Code actually respects CLAUDE.md?

indigodaddy8mo ago

This is why my instinct for this sort of task is, "write a script that I can use to do x y z," instead of "do x y z"

tinodb8mo ago

I have piped diffs into an(other) LLM and asked: is this a pure refactor or did things actually change? It usually gives quite good analysis…

exe348mo ago

this is why I'm terrified of large LLM slop changesets that I can't check side by side - but then that means I end up doing many small changes that are harder to describe in words than to just outright do.

j / k navigate · click thread line to collapse