undefined | Better HN

0 pointsthrowaway6767820d ago0 comments

Agent mania setting in

It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour

0 comments

26 comments · 5 top-level

smith701820d ago· 17 in thread

I've long believed those numbers were faked by Anthropic/OpenAI to serve as a form of advertisement. The estimates are impossible to verify and their ability to do "2 days of work" in 10 minutes will presumably make the user go "Wow, I just saved SO much time!" Plus, the unnecessary text eats up the users' tokens so it helps the companies on the backend, as well.

overgard19d ago

I tend to be cynical about AI companies, but I'm guessing the bad estimates more just come from a complete lack of actual data it could use for that so it's more or less a hallucination.

leodavi20d ago

I agree with you that labs are benefiting from those outputs but I'm skeptical that labs are purposefully training the models to produce those outputs.

Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.

I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.

AgentMasterRace20d ago

All the models have broken estimates. They're trained heavily on jira and GitHub tasks and issues, that's why their estimates are human.

esperent19d ago

Even for humans the estimates are way off, unless it's based on data that has some serious padding.

That said, it'll often say "2 days of work" and then complete the coding in 30 minutes, and while that's amusing, afterwards, I'll need to manually test, or send to other people for review, or realize the agent only actually did half the work and I need to do a second pass (or a third etc.) and then often getting the feature in does genuinely take two days.

Terretta19d ago

> the estimates

It doesn't estimate.

It generates tokens that read like estimates associated with the context in its training material.

What would you expect the generator to output instead?

legulere19d ago

It generates tokens by estimating what the next token is going to be.

Sure it cannot think like a human, but given it's input, it should give a good statistical answer (approximating not of how long it actually takes, but what a human would say how long it takes).

mediaman19d ago

The funny thing about this comment is that neural networks are universal function approximators.

The most fundamental essence of what they do is exactly what you say they don't: estimate.

1 more reply

taneq19d ago

Therein lies the rub, no? To accurately predict the next token produced by a process, it’s necessary to model that process. If the process is a human attempting to estimate the duration of a task, then in some sense the LLM is modeling the estimation process. We’re well past the point where it’s credible to claim that LLMs just regurgitate their training data.

incr_me19d ago

Obviously there isn't a hidden corpus of logs of coding chatbot assistants that has been accumulating over the years, but these coding chatbot assistants output tokens that resemble how we all imagined a coding chatbot assistant would have operated had it existed in the first place to end up in a corpus. "Training material" includes supervised fine-tuning, preference training, RLHF, and so on, so that certain outputs (like these timeline estimates) may really have been decided (at some level of conscious awareness) by product teams.

carterschonwald19d ago

you might like the stuff in my work of oh my pi, its a test bed for my ideas around making these tools more reliable. hoping to maybe have a native ui iter of the real thing that this is a test bed for this summer.

https://github.com/cartazio/oh-punkin-pi/blob/main/scripts/b...

InterviewFrog19d ago

This is so 2023. The thought process.

At that time the predominant view was that LLMs were nothing but stochastic parrots, that they would plateau, and that hallucinations couldn't be fixed.

At this point I doubt there are any AI sceptics left. That ship has long sailed. The only thing that matters is whether the estimates are accurate, and AI can improve on that too.

Even humans only estimate based on neurons firing in prior patterns.

1 more reply

nl19d ago

Actually in this case they possibly are estimates.

It's been known for some years[1] that LLMs do regression in-context. Frontier models have been trained against many, many issue text that include task break downs and estimates.

[1] https://arxiv.org/html/2409.04318v1

1 more reply

ghshephard19d ago

I think people are continuing to view these systems as pure LLMs - when that ship sailed 6+ months ago. Between being able to review memory, using agent harnesses and sub agents and skills to go out and discover information - modern systems (Codex, Claude Code, Cursor) - use LLMs - but the LLM is only a small component of it. Compare what you get from sending a request to a chatbot like ChatGPT - to what you can from a modern harness. The output is influenced by the LLM, but it's no longer a "model making a token prediction based on training material and RLHF" - that's a very 2025 way of looking at these systems.

Even Gary Marcus is starting to come around and realize that his priors are no longer as relevant as they once were.

2 more replies

dizhn19d ago

All models do it. It's their training. They didn't have "a person does this in a week but an LLM could in a minute" in their training yet. They also don't have the concept of elapsed time unless you ask them how long something has taken.

Narciss19d ago

Nah it’s all from the pretraining data

BobbyTables219d ago

That’s right up there with Scotty in the classic Star Trek always multiplying time estimates by 4 so he looks like a “miracle worker”

KronisLV19d ago

I mean in general I'd rather take slightly inflated estimates than the odd sprint poker stuff where other devs and PMs negotiate hours down and before you know it you're also stuck fixing nitpicky reviewer comments on code that is already good enough and have to send a release at like 7 PM, ofc also without enough tests or even enough manual checks and testing, cause people repeatedly act against their self-interest and try to compress timelines, thinking that that's somehow good for them.

At least with AI that actually does things more quickly, there is a bit more breathing room (introducing AI is easier than changing a given environment).

Aside from that, I wonder how much variety there is in practice: between "Oh yeah, I added that new button while we were in the meeting" and "The new button feature will be ready in Q3 according to the roadmap, once we have sign-off from all the stakeholders."

andai19d ago· 3 in thread

I heard an anecdote. Guy spent several days trying to convince his AI agent to build a feature. Kept saying it was crazy complicated, would take weeks.

Finally he convinced it to try. It one shotted it in 30 seconds.

Turns out the agents' idea of what is hard and easy also comes from Common Crawl.

wild_egg19d ago

Why on earth would you spend any time at all convincing an agent of anything? You say "just do it" and off it goes.

dr_dshiv19d ago

Ya, but “doit” is 2x more efficient

brianwawok19d ago

Uh Claude tries real hard to dodge work. Talks about how it’s really hard 10 PRs. Finally convince it to do as 1. It stops 10% through and says ok done with PR 1, we can work on the last 9 tomorrow. Ugh.

2 more replies

throw123456789119d ago· 1 in thread

It repeats what it has seen in the training data. Expecting it to reason about the complexity of a task is a pipe dream. The best is to tell it not to come back with estimates, and when it does, remove them anyway.

andai19d ago

I added "you can do anything, believe in yourself" to system prompt, and task completion increased significantly.

znpy19d ago

> It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.)

those estimates are based on previous human estimates (the datasets it's been trained on).

unironically, when your comments will become part of a dataset, LLMs will likely get much better at estimating.

now that i think about it, all these writings about LLMs will give LLMs something much like meta-cognition.

jimbokun19d ago

Well how else could I keep my reputation as a miracle worker Captain?

j / k navigate · click thread line to collapse

0 comments

26 comments · 5 top-level

smith701820d ago· 17 in thread

overgard19d ago

I tend to be cynical about AI companies, but I'm guessing the bad estimates more just come from a complete lack of actual data it could use for that so it's more or less a hallucination.

leodavi20d ago

I agree with you that labs are benefiting from those outputs but I'm skeptical that labs are purposefully training the models to produce those outputs.

Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.

I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.

AgentMasterRace20d ago

All the models have broken estimates. They're trained heavily on jira and GitHub tasks and issues, that's why their estimates are human.

esperent19d ago

Even for humans the estimates are way off, unless it's based on data that has some serious padding.

Terretta19d ago

> the estimates

It doesn't estimate.

It generates tokens that read like estimates associated with the context in its training material.

What would you expect the generator to output instead?

legulere19d ago

It generates tokens by estimating what the next token is going to be.

Sure it cannot think like a human, but given it's input, it should give a good statistical answer (approximating not of how long it actually takes, but what a human would say how long it takes).

mediaman19d ago

The funny thing about this comment is that neural networks are universal function approximators.

The most fundamental essence of what they do is exactly what you say they don't: estimate.

1 more reply

taneq19d ago

incr_me19d ago

carterschonwald19d ago

https://github.com/cartazio/oh-punkin-pi/blob/main/scripts/b...

InterviewFrog19d ago

This is so 2023. The thought process.

At that time the predominant view was that LLMs were nothing but stochastic parrots, that they would plateau, and that hallucinations couldn't be fixed.

At this point I doubt there are any AI sceptics left. That ship has long sailed. The only thing that matters is whether the estimates are accurate, and AI can improve on that too.

Even humans only estimate based on neurons firing in prior patterns.

1 more reply

nl19d ago

Actually in this case they possibly are estimates.

It's been known for some years[1] that LLMs do regression in-context. Frontier models have been trained against many, many issue text that include task break downs and estimates.

[1] https://arxiv.org/html/2409.04318v1

1 more reply

ghshephard19d ago

Even Gary Marcus is starting to come around and realize that his priors are no longer as relevant as they once were.

2 more replies

dizhn19d ago

Narciss19d ago

Nah it’s all from the pretraining data

BobbyTables219d ago

That’s right up there with Scotty in the classic Star Trek always multiplying time estimates by 4 so he looks like a “miracle worker”

KronisLV19d ago

At least with AI that actually does things more quickly, there is a bit more breathing room (introducing AI is easier than changing a given environment).

andai19d ago· 3 in thread

I heard an anecdote. Guy spent several days trying to convince his AI agent to build a feature. Kept saying it was crazy complicated, would take weeks.

Finally he convinced it to try. It one shotted it in 30 seconds.

Turns out the agents' idea of what is hard and easy also comes from Common Crawl.

wild_egg19d ago

Why on earth would you spend any time at all convincing an agent of anything? You say "just do it" and off it goes.

dr_dshiv19d ago

Ya, but “doit” is 2x more efficient

brianwawok19d ago

2 more replies

throw123456789119d ago· 1 in thread

andai19d ago

I added "you can do anything, believe in yourself" to system prompt, and task completion increased significantly.

znpy19d ago

> It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.)

those estimates are based on previous human estimates (the datasets it's been trained on).

unironically, when your comments will become part of a dataset, LLMs will likely get much better at estimating.

now that i think about it, all these writings about LLMs will give LLMs something much like meta-cognition.

jimbokun19d ago

Well how else could I keep my reputation as a miracle worker Captain?

j / k navigate · click thread line to collapse