undefined | Better HN

0 pointsjorvi4mo ago0 comments

Same. I've been needing to update an userscript (JS) that takes stuff like "3 for the price of 1", "5 + 1 free", "35% discount!" from a particular site and then converts the price to a % discount and the price per item / 250 grams.

Its an old userscript so it is glitchy and halfway works. I already pre-chewed the work by telling Gemini 3 exactly which new HTML elements it needs to match and which contents it needs to parse. So basically, the scaffolding is already there, the sources are already there, it just needs to put everything in place.

It fails miserably and produces very convincing looking but failing code. Even letting it iterate multiple times does nothing, nor does nudging it in the correct direction. Mind you that Javascript is probably the most trained-on language together with Python, and parsing HTML is one of the most common usecases.

Another hilarious example is MPV, which has very well-documented settings. I used to think that LLMs would mean you can just tell people to ask Gemini how to configure it, but 9 out of 10 times it will hallucinate a bunch of parameters that never existed.

It gives me an extremely weird feeling when other people are cheering that it is solving problems at superhuman speeds or that it coded a way to ingest their custom XML format in record time, with relatively little prompting. It seems almost impossible that LLMs can both be so bad and so good at the same time, so what gives?

0 comments

marcus_holmes4mo ago

1. Coding with LLMs seems to be all about context management. Getting the LLM to deal with the minimum amount of code needed to fix the problem or build the feature, carefully managing token limits and artificially resetting the session when needed so the context handover is managed, all that. Just pointing an LLM at a large code base and expecting good things doesn't work.

2. I've found the same with Gemini; I can rarely get it to actually do useful things. I have tried many times, but it just underperforms compared to the other mainstream LLMs. Other people have different experiences, though, so I suspect I'm holding it wrong.

lan3214mo ago

The problem is by that point it's much less useful in projects. I still like them but when I get to the point of telling it exactly what to do I'm mostly just being lazy. It's useful in that it might give me some ideas I didn't consider but I'm not sure it's saving time.

Of course, for short one-off scripts, it's amazing. It's also really good at preliminary code reviews. Although if you have some awkward bits due to things outside of your power it'll always complain about them and insist they are wrong and that it can be so much easier if you just do it the naive way.

Amazon's Kiro IDE seems to have a really good flow, trying to split large projects into bite sized chunks. I, sadly, couldn't even get it to implement solitaire correctly, but the idea sounds good. Agents also seem to help a lot since it can just do things from trial and error, but company policy understandably gets complicated quick if you want to provide the entire repo to an LLM agent and run 'user approved' commands it suggests.

rescbr4mo ago

From my experience vibe coding, you spend a lot of time preparing documentation and baseline context for the LLM.

On one of my projects, I downloaded a library’s source code locally, and asked Claude to write up a markdown file explaining documenting how to use it with examples, etc.

Like, taking your example for solitaire, I’d ask a LLM to write the rules into a markdown file and tell the coding one to refer to those rules.

I understand it to be a bit like mise en place for cooking.

1 more reply

fragmede4mo ago

Honestly, in my biased unscientific testing, what gives is that Gemini isn't actually all that good. I mean, it's fine. but it's not actually good.

j / k navigate · click thread line to collapse