undefined | Better HN

0 pointsteucris16d ago0 comments

But agents do keep task lists and check the tasks off as they go. Of course it’s not perfect either but it’s MUCH better than an LLM can offer on its own.

If you are seeing an agent missing tasks, work with it to write down the task list first and then hold it accountable to completing them all. A spec is not a plan.

0 comments

mathisfun12316d ago

bro do you really not understand that that's a game played for your sake - it checks boxes yes but you have no idea what effect the checking of the boxes actually has. like do you not realize/understand that anthropic/openai is baking this kind of stuff into models/UI/UX to give the sensation of rigor.

jwitthuhn15d ago

The checkboxes inform the model as well as the user, and you can observe this yourself. For example in a C++ project with MyClass defined in MyClass.cpp/h:

I ask the model to rename MyClass to MyNewClass. It will generate a checklist like:

- Rename references in all source files

- Rename source/header files

- Update build files to point at new source files

Then it will do those things in that order.

Now you can re-run it but inject the start of the model's response with the order changed in that list. It will follow the new order. The list plainly provides real information that influences future predictions and isn't just a facade for the user.

dragandj15d ago

And when it doesn't, it politely apologizes, at least :)

_puk16d ago

Not to knee jerk on a bro comment, but, bro..

Are you seriously saying that breaking a large complex problem down into it's constituent steps, and then trying to solve each one of them as an individual problem is just a sensation of rigour?

stvltvs15d ago

I believe they're saying that the checkboxes are window dressing, not an accurate reflection of what the LLM has done.

kazinator16d ago

To some extent, I could agree with that idea. One purpose of that process is to match the impedance between the problem, and human cognition. But that presumes problem solving inherently requires human cognition, which is false; that's just the tool that we have for problem solving. When the problem-solving method matches the cognitive strengths and weaknesses of the problem solvers, they do have a certain sensation of having an upper hand over the problem. Part of that comes from the chunking/division allowing the problem solvers to more easily talk about the problem; have conversations and narratives around it. The ability to spin coherent narratives feels like rigor.

mathisfun12315d ago

I'm saying that's not what the stupid bot is actually doing, it's what anthropic added to the TUI to make you feel good in your feelies about what the bot is actually doing (spamming).

Edit: I'll give you another example that I realized because someone pointed it out here: when the stupid bot tells you why it fucked up, it doesn't actually understand anything about itself - it's just generating the most likely response given the enormous amount of pontification on the internet about this very subject...

_puk15d ago

I'm not disagreeing in principle, but the detritus left after an anthropic outage is usually quite usable in a completely fresh session. The amount of context pulled and stored in the sandbox is quite hefty.

Whist I can't usually start from the exact same point in the decisioning, I can usually bootstrap a new session. It's not all ephemeral.

To your edit: I find that the most galling thing about finding out about the thinking being discarded at cache clear. Reconstruction of the logical route it took to get to the end state is just not the same as the step by step process it took in the first place, which again I feel counters your "feelies".

1 more reply

j / k navigate · click thread line to collapse