Yeah, you're definitely right about the shifting goalposts ("it's a stochastic parrot" -> "it hallucinates all the time, it can't even get APIs right" -> "it can generate functions but can't reason about the codebase" -> "the bottleneck was never shipping code")
At the same time, humans can move up the abstraction ladder faster than the LLMs can. At least, some humans. Agents can produce lots of code. They can also do the entirely wrong thing. The impact of wrong decisions have been massively write-amplified with more and more intelligent LLMs. With earlier ones, it got a sentence or a function wrong, you reprompted, the cost of a mistake was 10 seconds. Now, you can burn hours or even days of work on the entirely wrong thing without a competent human operator stepping in and course-correcting.
The trajectory of agents have been bigger and bigger context windows, bigger autonomy, but at the same time, a bigger blast radius. In this context, I don't think the human experts will be out of their jobs any time soon.