undefined | Better HN

0 pointshansmayer9mo ago0 comments

No, I am not disputing whatever productivity gains you seem to be getting. I was just curious if it LLMs feeding data into each other can work that well, knowing how long it took OpenAI to make ChatGPT properly count the number of "R"s in the word "strawberry". There is this effect called "Habsburg AI". I reckon the syntax-check and linting stuff is straightforward, as it adds a deterministic element to it, but what do you do about the more tricky stuff like dreamt up functions and code packages? Unsafe practices like sensitive exposing data in cleartext, Linux commands which are downright the opposite of what was prompted, etc? That comes up a fair amount of times and I am not sure that LLMs are going to self-correct here, without human input.

0 comments

vidarh9mo ago

It doesn't stop them from making stupid mistakes. It does reduce the amount of time I have to deal with the stupid mistakes that they know how to fix if the problem is pointed out to them, so that I can focus on more focused diffs of cleaner code.

E.g. a real example: The tooling I mentioned at one point early on made the correct functional change, but it's written in Ruby and Ruby allows defining methods multiple times in the same class - the later version just overrides the former. This would of course be a compilation error in most other languages. It's a weakness of using Ruby with a careless (or mindless) developer...

But Rubocop - a linter - will catch it. So forcing all changes through Rubocop and just returning the errors to LLM made it recognise the mistake and delete the old method.

It lowers the cognitive load of the review. Instead of having to wade through and resolve a lot of cruft and make sense of unusually structured code, you can focus on the actual specific changes and subject those to more scrutiny.

And then my plan is to experiment with more semantic checks of the same style as what Rubocop uses, but less prescriptive, of the type "maybe you should pay extra attention here, and explain why this is correct/safe" etc. An example might be to trigger this for any change that involves reading a key or password field or card number whether or not there is a problem with it, and both trigger the LLM to "look twice" and indicate it as an area to pay extra attention to in a human review.

It doesn't need to be perfect, it just need to provide enough of a harness to make it easier for humans in the loop to spot the remaining issues.

hansmayerOP9mo ago

Right, so you understand that any dev who already uses for example Github Copilot with various code syntax extensions already achieves whatever it is that your new service is delivering? I'd spare myself the effort if I were you.

vidarh9mo ago

It didn't start with the intent of being a service; I started with it because there were a number of things that Copilot or tools like Claude Code doesn't do well enough that annoyed me, and spending a few hours was sufficient to get to the point where it's now my primary coding assistant because it works better for me for my stack, and because I can evolve it further to solve the specific problems I need solved.

So, no, I'll keep doing this because doing this is already saving me effort for my other projects.

j / k navigate · click thread line to collapse

0 pointshansmayer9mo ago0 comments

0 comments

vidarh9mo ago

But Rubocop - a linter - will catch it. So forcing all changes through Rubocop and just returning the errors to LLM made it recognise the mistake and delete the old method.

It doesn't need to be perfect, it just need to provide enough of a harness to make it easier for humans in the loop to spot the remaining issues.

hansmayerOP9mo ago

vidarh9mo ago

So, no, I'll keep doing this because doing this is already saving me effort for my other projects.

j / k navigate · click thread line to collapse