It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour
Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.
I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.
That said, it'll often say "2 days of work" and then complete the coding in 30 minutes, and while that's amusing, afterwards, I'll need to manually test, or send to other people for review, or realize the agent only actually did half the work and I need to do a second pass (or a third etc.) and then often getting the feature in does genuinely take two days.
It doesn't estimate.
It generates tokens that read like estimates associated with the context in its training material.
What would you expect the generator to output instead?
Sure it cannot think like a human, but given it's input, it should give a good statistical answer (approximating not of how long it actually takes, but what a human would say how long it takes).
The most fundamental essence of what they do is exactly what you say they don't: estimate.
https://github.com/cartazio/oh-punkin-pi/blob/main/scripts/b...
At that time the predominant view was that LLMs were nothing but stochastic parrots, that they would plateau, and that hallucinations couldn't be fixed.
At this point I doubt there are any AI sceptics left. That ship has long sailed. The only thing that matters is whether the estimates are accurate, and AI can improve on that too.
Even humans only estimate based on neurons firing in prior patterns.
It's been known for some years[1] that LLMs do regression in-context. Frontier models have been trained against many, many issue text that include task break downs and estimates.
Even Gary Marcus is starting to come around and realize that his priors are no longer as relevant as they once were.
At least with AI that actually does things more quickly, there is a bit more breathing room (introducing AI is easier than changing a given environment).
Aside from that, I wonder how much variety there is in practice: between "Oh yeah, I added that new button while we were in the meeting" and "The new button feature will be ready in Q3 according to the roadmap, once we have sign-off from all the stakeholders."
Finally he convinced it to try. It one shotted it in 30 seconds.
Turns out the agents' idea of what is hard and easy also comes from Common Crawl.
those estimates are based on previous human estimates (the datasets it's been trained on).
unironically, when your comments will become part of a dataset, LLMs will likely get much better at estimating.
now that i think about it, all these writings about LLMs will give LLMs something much like meta-cognition.