undefined | Better HN

0 pointstonyarkles15d ago0 comments

I'll throw this out as something where it has saved literally weeks of work: debugging pathological behaviour in third-party code. Prompt example: "Today, when I did U, V, and W. I ended up with X happening. I fixed it by doing Y. The second time I tried, Z happened instead (which was the expected behaviour). Can you work out a plausible explanation for why X happened the first time and why Y fixed it? Please keep track of the specific lines of code where the behaviour difference shows up."

This is in a real-time stateful system, not a system where I'd necessarily expect the exact same thing to happen every time. I just wanted to understand why it behaved differently because there wasn't any obvious reason, to me, why it would.

The explanation it came back with was pretty wild. It essentially boiled down to a module not being adequately initialized before it was used the first time and then it maintained its state from then on out. The narrative touched a lot of code, and the source references it provided did an excellent job of walking me through the narrative. I independently validated the explanation using some telemetry data that the LLM didn't have access to. It was correct. This would have taken me a very long time to work out by hand.

Edit: I have done this multiple times and have been blown away each time.

0 comments

jeppebemad15d ago

This seems to be a common denominator for what LLMs actually do well: Finding bugs and explaining code. Anything about producing code is still a success to be seen.

zahlman15d ago

> Prompt example: "Today, when I did U, V, and W. I ended up with X happening. I fixed it by doing Y. The second time I tried, Z happened instead (which was the expected behaviour). Can you work out a plausible explanation for why X happened the first time and why Y fixed it? Please keep track of the specific lines of code where the behaviour difference shows up."

> The explanation it came back with was pretty wild. It essentially boiled down to a module not being adequately initialized before it was used the first time and then it maintained its state from then on out.

Even without knowing any of the variable values, that explanation doesn't sound wild at all to me. It sounds in fact entirely plausible, and very much like what I'd expect the right answer to sound like.

tonyarklesOP14d ago

The wild part, for me at the time, was how many steps there were from cause and effect and how perfectly they'd been reasoned through. The first time I had that experience was my first real "this LLM stuff might have some legs". My second similar experience several days later was "hmmm that wasn't a fluke..."

I'm still at a stage where I'm not completely sure that I like the code that Codex or Claude wants to write. Sometimes it's good, sometimes it takes 5 or 6 iterations to get it somewhere I'm happy with. But wow, on the front end of the work, they are great design/review/iterate partners; sometimes I let the tools write the first draft and then I find the gaps, sometimes I write the first draft and let the tools find the gaps. Either way has worked really well for making solid debt-free progress.