undefined | Better HN

0 pointshintymad2mo ago0 comments

> With hard requirements listed, I found out that the generated code missed requirements,

This is hardly a surprise, no? No matter how much training we run, we are still producing a generative model. And a generative model doesn't understand your requirements and cross them off. It predicts the next most likely token from a given prompt. If the most statistically plausible way to finish a function looks like a version that ignores your third requirement, the model will happily follow through. There's really no rules in your requirements doc. They are just the conditional events X in a glorified P(Y|X). I'd venture to guess that sometimes missing a requirement may increase the probability of the generated tokens, so the model will happily allow the miss. Actually, "allow" is too strong a word. The model does not allow shit. It just generates.

0 comments

8 comments · 1 top-level

teucris2mo ago· 7 in thread

But agents do keep task lists and check the tasks off as they go. Of course it’s not perfect either but it’s MUCH better than an LLM can offer on its own.

If you are seeing an agent missing tasks, work with it to write down the task list first and then hold it accountable to completing them all. A spec is not a plan.

mathisfun1232mo ago

bro do you really not understand that that's a game played for your sake - it checks boxes yes but you have no idea what effect the checking of the boxes actually has. like do you not realize/understand that anthropic/openai is baking this kind of stuff into models/UI/UX to give the sensation of rigor.

jwitthuhn2mo ago

The checkboxes inform the model as well as the user, and you can observe this yourself. For example in a C++ project with MyClass defined in MyClass.cpp/h:

I ask the model to rename MyClass to MyNewClass. It will generate a checklist like:

- Rename references in all source files

- Rename source/header files

- Update build files to point at new source files

Then it will do those things in that order.

Now you can re-run it but inject the start of the model's response with the order changed in that list. It will follow the new order. The list plainly provides real information that influences future predictions and isn't just a facade for the user.

dragandj1mo ago

And when it doesn't, it politely apologizes, at least :)

_puk2mo ago

Not to knee jerk on a bro comment, but, bro..

Are you seriously saying that breaking a large complex problem down into it's constituent steps, and then trying to solve each one of them as an individual problem is just a sensation of rigour?

stvltvs2mo ago

I believe they're saying that the checkboxes are window dressing, not an accurate reflection of what the LLM has done.

mathisfun1232mo ago

I'm saying that's not what the stupid bot is actually doing, it's what anthropic added to the TUI to make you feel good in your feelies about what the bot is actually doing (spamming).

Edit: I'll give you another example that I realized because someone pointed it out here: when the stupid bot tells you why it fucked up, it doesn't actually understand anything about itself - it's just generating the most likely response given the enormous amount of pontification on the internet about this very subject...

1 more reply

kazinator2mo ago

To some extent, I could agree with that idea. One purpose of that process is to match the impedance between the problem, and human cognition. But that presumes problem solving inherently requires human cognition, which is false; that's just the tool that we have for problem solving. When the problem-solving method matches the cognitive strengths and weaknesses of the problem solvers, they do have a certain sensation of having an upper hand over the problem. Part of that comes from the chunking/division allowing the problem solvers to more easily talk about the problem; have conversations and narratives around it. The ability to spin coherent narratives feels like rigor.

j / k navigate · click thread line to collapse

0 comments

8 comments · 1 top-level

teucris2mo ago· 7 in thread

But agents do keep task lists and check the tasks off as they go. Of course it’s not perfect either but it’s MUCH better than an LLM can offer on its own.

If you are seeing an agent missing tasks, work with it to write down the task list first and then hold it accountable to completing them all. A spec is not a plan.

mathisfun1232mo ago

jwitthuhn2mo ago

The checkboxes inform the model as well as the user, and you can observe this yourself. For example in a C++ project with MyClass defined in MyClass.cpp/h:

I ask the model to rename MyClass to MyNewClass. It will generate a checklist like:

- Rename references in all source files

- Rename source/header files

- Update build files to point at new source files

Then it will do those things in that order.

dragandj1mo ago

And when it doesn't, it politely apologizes, at least :)

_puk2mo ago

Not to knee jerk on a bro comment, but, bro..

Are you seriously saying that breaking a large complex problem down into it's constituent steps, and then trying to solve each one of them as an individual problem is just a sensation of rigour?

stvltvs2mo ago

I believe they're saying that the checkboxes are window dressing, not an accurate reflection of what the LLM has done.

mathisfun1232mo ago

I'm saying that's not what the stupid bot is actually doing, it's what anthropic added to the TUI to make you feel good in your feelies about what the bot is actually doing (spamming).

1 more reply

kazinator2mo ago

j / k navigate · click thread line to collapse